2025-05-07T19:42:32.6915071Z Current runner version: '2.323.0' 2025-05-07T19:42:32.6920754Z Runner name: 'i-0405906171cd7041e' 2025-05-07T19:42:32.6921747Z Machine name: 'ip-10-0-12-123' 2025-05-07T19:42:32.6924711Z ##[group]GITHUB_TOKEN Permissions 2025-05-07T19:42:32.6926912Z Contents: read 2025-05-07T19:42:32.6927505Z Metadata: read 2025-05-07T19:42:32.6928158Z Packages: read 2025-05-07T19:42:32.6928710Z ##[endgroup] 2025-05-07T19:42:32.6930814Z Secret source: None 2025-05-07T19:42:32.6931503Z Prepare workflow directory 2025-05-07T19:42:32.7535129Z Prepare all required actions 2025-05-07T19:42:32.7579945Z Getting action download info 2025-05-07T19:42:32.9454637Z Download action repository 'actions/checkout@v4' (SHA:11bd71901bbe5b1630ceea73d27597364c9af683) 2025-05-07T19:42:33.2337982Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-05-07T19:42:33.7979185Z Complete job name: build_artifact (x86, linux.24xlarge, genai, 3.11, 12.8.0, clang) 2025-05-07T19:42:33.8847815Z A job started hook has been configured by the self-hosted runner administrator 2025-05-07T19:42:33.8978800Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-05-07T19:42:33.8989510Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:33.8990935Z ##[endgroup] 2025-05-07T19:42:35.0751174Z Runner Type: linux.24xlarge 2025-05-07T19:42:35.0751791Z Instance Type: c5.24xlarge 2025-05-07T19:42:35.0752201Z AMI Name: unknown 2025-05-07T19:42:35.0782403Z AMI ID: ami-071226ecf16aa7d96 2025-05-07T19:42:40.2083534Z ##[group]Checking docker version 2025-05-07T19:42:40.2097102Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2025-05-07T19:42:40.2317039Z '1.44' 2025-05-07T19:42:40.2333469Z Docker daemon API version: '1.44' 2025-05-07T19:42:40.2333997Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2025-05-07T19:42:40.2524214Z '1.44' 2025-05-07T19:42:40.2532663Z Docker client API version: '1.44' 2025-05-07T19:42:40.2536992Z ##[endgroup] 2025-05-07T19:42:40.2539485Z ##[group]Clean up resources from previous jobs 2025-05-07T19:42:40.2544779Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=36ad75" 2025-05-07T19:42:40.2686340Z ##[command]/usr/bin/docker network prune --force --filter "label=36ad75" 2025-05-07T19:42:40.2823658Z ##[endgroup] 2025-05-07T19:42:40.2824039Z ##[group]Create local container network 2025-05-07T19:42:40.2833300Z ##[command]/usr/bin/docker network create --label 36ad75 github_network_60639c660a9d41089b45e16508e07c21 2025-05-07T19:42:40.5195352Z 2c8678ae2d5cc44c824bc1e30bc8726120fa16ecbfa38f39f5fdbdcead8b9aad 2025-05-07T19:42:40.5222496Z ##[endgroup] 2025-05-07T19:42:40.5244497Z ##[group]Starting job container 2025-05-07T19:42:40.5263068Z ##[command]/usr/bin/docker pull amazonlinux:2023 2025-05-07T19:42:40.6461312Z 2023: Pulling from library/amazonlinux 2025-05-07T19:42:40.6537421Z Digest: sha256:cb5b4c509d62ae388f674c139ae5e8281fc160c217d474445e912043e1941988 2025-05-07T19:42:40.6537972Z Status: Image is up to date for amazonlinux:2023 2025-05-07T19:42:40.6546700Z docker.io/library/amazonlinux:2023 2025-05-07T19:42:40.6626080Z ##[command]/usr/bin/docker create --name b1b3efcb00dd441ea660ffc468bcf084_amazonlinux2023_d732a4 --label 36ad75 --workdir /__w/FBGEMM/FBGEMM --network github_network_60639c660a9d41089b45e16508e07c21 --user root -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/ec2-user/actions-runner/_work":"/__w" -v "/home/ec2-user/actions-runner/externals":"/__e":ro -v "/home/ec2-user/actions-runner/_work/_temp":"/__w/_temp" -v "/home/ec2-user/actions-runner/_work/_actions":"/__w/_actions" -v "/home/ec2-user/actions-runner/_work/_tool":"/__w/_tool" -v "/home/ec2-user/actions-runner/_work/_temp/_github_home":"/github/home" -v "/home/ec2-user/actions-runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" amazonlinux:2023 "-f" "/dev/null" 2025-05-07T19:42:40.7497855Z 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 2025-05-07T19:42:40.7524219Z ##[command]/usr/bin/docker start 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 2025-05-07T19:42:41.2067712Z 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 2025-05-07T19:42:41.2090348Z ##[command]/usr/bin/docker ps --all --filter id=2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2025-05-07T19:42:41.2235757Z 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 Up Less than a second 2025-05-07T19:42:41.2256172Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 2025-05-07T19:42:41.2400474Z HOME=/github/home 2025-05-07T19:42:41.2401024Z GITHUB_ACTIONS=true 2025-05-07T19:42:41.2401404Z CI=true 2025-05-07T19:42:41.2401896Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:41.2422131Z ##[endgroup] 2025-05-07T19:42:41.2431487Z ##[group]Waiting for all services to be ready 2025-05-07T19:42:41.2433136Z ##[endgroup] 2025-05-07T19:42:41.2508249Z ##[group]Run yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:41.2509052Z yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:41.2509921Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:41.2510344Z env: 2025-05-07T19:42:41.2510688Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:41.2511085Z BUILD_ENV: build_binary 2025-05-07T19:42:41.2511392Z BUILD_TARGET: genai 2025-05-07T19:42:41.2511745Z BUILD_VARIANT: cuda 2025-05-07T19:42:41.2512031Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:41.2512372Z ##[endgroup] 2025-05-07T19:42:42.0463271Z Amazon Linux 2023 repository 68 MB/s | 37 MB 00:00 2025-05-07T19:42:48.6192956Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:49.1692176Z Dependencies resolved. 2025-05-07T19:42:49.1869188Z Nothing to do. 2025-05-07T19:42:49.1870614Z Complete! 2025-05-07T19:42:49.4147781Z Last metadata expiration check: 0:00:08 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:49.4768881Z Dependencies resolved. 2025-05-07T19:42:49.4997093Z ======================================================================================== 2025-05-07T19:42:49.4998494Z Package Arch Version Repository Size 2025-05-07T19:42:49.4999206Z ======================================================================================== 2025-05-07T19:42:49.4999639Z Installing: 2025-05-07T19:42:49.5000091Z binutils x86_64 2.41-50.amzn2023.0.3 amazonlinux 5.3 M 2025-05-07T19:42:49.5000738Z findutils x86_64 1:4.8.0-2.amzn2023.0.2 amazonlinux 539 k 2025-05-07T19:42:49.5001354Z git x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 54 k 2025-05-07T19:42:49.5001994Z pciutils x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 93 k 2025-05-07T19:42:49.5002823Z sudo x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 1.3 M 2025-05-07T19:42:49.5003417Z tar x86_64 2:1.34-1.amzn2023.0.4 amazonlinux 879 k 2025-05-07T19:42:49.5003960Z wget x86_64 1.21.3-1.amzn2023.0.4 amazonlinux 779 k 2025-05-07T19:42:49.5004578Z which x86_64 2.21-26.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5005095Z Installing dependencies: 2025-05-07T19:42:49.5005545Z cracklib x86_64 2.9.6-27.amzn2023.0.2 amazonlinux 82 k 2025-05-07T19:42:49.5006207Z cyrus-sasl-lib x86_64 2.1.27-18.amzn2023.0.3 amazonlinux 786 k 2025-05-07T19:42:49.5006886Z elfutils-debuginfod-client x86_64 0.188-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5007578Z git-core x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 4.7 M 2025-05-07T19:42:49.5008519Z git-core-doc noarch 2.47.1-1.amzn2023.0.2 amazonlinux 2.8 M 2025-05-07T19:42:49.5009127Z gnutls x86_64 3.8.3-6.amzn2023.0.1 amazonlinux 1.1 M 2025-05-07T19:42:49.5009800Z groff-base x86_64 1.22.4-7.amzn2023.0.2 amazonlinux 1.0 M 2025-05-07T19:42:49.5010368Z gzip x86_64 1.12-1.amzn2023.0.1 amazonlinux 160 k 2025-05-07T19:42:49.5011013Z hwdata noarch 0.384-1.amzn2023.0.3 amazonlinux 1.6 M 2025-05-07T19:42:49.5011618Z jansson x86_64 2.14-0.amzn2023 amazonlinux 46 k 2025-05-07T19:42:49.5012254Z kmod-libs x86_64 29-2.amzn2023.0.5 amazonlinux 62 k 2025-05-07T19:42:49.5012898Z less x86_64 608-2.amzn2023.0.2 amazonlinux 168 k 2025-05-07T19:42:49.5013605Z libcbor x86_64 0.7.0-3.amzn2023.0.2 amazonlinux 57 k 2025-05-07T19:42:49.5014241Z libdb x86_64 5.3.28-49.amzn2023.0.2 amazonlinux 756 k 2025-05-07T19:42:49.5014834Z libeconf x86_64 0.4.0-1.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:49.5015546Z libedit x86_64 3.1-38.20210714cvs.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:49.5016144Z libfdisk x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 153 k 2025-05-07T19:42:49.5016696Z libfido2 x86_64 1.10.0-2.amzn2023.0.2 amazonlinux 95 k 2025-05-07T19:42:49.5017443Z libmetalink x86_64 0.1.3-14.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5018095Z libpwquality x86_64 1.4.4-6.amzn2023.0.2 amazonlinux 106 k 2025-05-07T19:42:49.5018681Z libsemanage x86_64 3.4-5.amzn2023.0.2 amazonlinux 121 k 2025-05-07T19:42:49.5019302Z libutempter x86_64 1.2.1-4.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:49.5019824Z nano x86_64 8.3-1.amzn2023 amazonlinux 706 k 2025-05-07T19:42:49.5020436Z ncurses x86_64 6.2-4.20200222.amzn2023.0.6 amazonlinux 394 k 2025-05-07T19:42:49.5021009Z nettle x86_64 3.10.1-1.amzn2023.0.1 amazonlinux 573 k 2025-05-07T19:42:49.5021522Z openldap x86_64 2.4.57-6.amzn2023.0.7 amazonlinux 256 k 2025-05-07T19:42:49.5022157Z openssh x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 454 k 2025-05-07T19:42:49.5022726Z openssh-clients x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 708 k 2025-05-07T19:42:49.5023310Z pam x86_64 1.5.1-8.amzn2023.0.4 amazonlinux 542 k 2025-05-07T19:42:49.5024151Z pciutils-libs x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5024842Z perl-AutoLoader noarch 5.74-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:49.5025668Z perl-B x86_64 1.80-477.amzn2023.0.6 amazonlinux 179 k 2025-05-07T19:42:49.5026326Z perl-Carp noarch 1.50-458.amzn2023.0.2 amazonlinux 29 k 2025-05-07T19:42:49.5027044Z perl-Class-Struct noarch 0.66-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:49.5027758Z perl-Data-Dumper x86_64 2.174-460.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:49.5140627Z perl-Digest noarch 1.20-1.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:49.5141241Z perl-Digest-MD5 x86_64 2.58-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5141814Z perl-DynaLoader x86_64 1.47-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:49.5142695Z perl-Encode x86_64 4:3.15-462.amzn2023.0.2 amazonlinux 1.7 M 2025-05-07T19:42:49.5143319Z perl-Errno x86_64 1.30-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5143986Z perl-Error noarch 1:0.17029-5.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5144582Z perl-Exporter noarch 5.74-459.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5145133Z perl-Fcntl x86_64 1.13-477.amzn2023.0.6 amazonlinux 21 k 2025-05-07T19:42:49.5145746Z perl-File-Basename noarch 2.85-477.amzn2023.0.6 amazonlinux 18 k 2025-05-07T19:42:49.5146324Z perl-File-Find noarch 1.37-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:49.5146906Z perl-File-Path noarch 2.18-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5147514Z perl-File-Temp noarch 1:0.231.100-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:49.5148213Z perl-File-stat noarch 1.09-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:49.5148815Z perl-FileHandle noarch 2.03-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:49.5149409Z perl-Getopt-Long noarch 1:2.52-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:49.5150016Z perl-Getopt-Std noarch 1.12-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:49.5150593Z perl-Git noarch 2.47.1-1.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5151146Z perl-HTTP-Tiny noarch 0.078-1.amzn2023.0.3 amazonlinux 56 k 2025-05-07T19:42:49.5151705Z perl-IO x86_64 1.43-477.amzn2023.0.6 amazonlinux 87 k 2025-05-07T19:42:49.5152336Z perl-IPC-Open3 noarch 1.21-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:49.5152898Z perl-MIME-Base64 x86_64 3.16-2.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5153440Z perl-Net-SSLeay x86_64 1.94-1.amzn2023.0.1 amazonlinux 392 k 2025-05-07T19:42:49.5153981Z perl-POSIX x86_64 1.94-477.amzn2023.0.6 amazonlinux 97 k 2025-05-07T19:42:49.5154542Z perl-PathTools x86_64 3.78-459.amzn2023.0.2 amazonlinux 85 k 2025-05-07T19:42:49.5155129Z perl-Pod-Escapes noarch 1:1.07-458.amzn2023.0.2 amazonlinux 20 k 2025-05-07T19:42:49.5155764Z perl-Pod-Perldoc noarch 3.28.01-459.amzn2023.0.3 amazonlinux 84 k 2025-05-07T19:42:49.5156338Z perl-Pod-Simple noarch 1:3.42-2.amzn2023.0.2 amazonlinux 215 k 2025-05-07T19:42:49.5156895Z perl-Pod-Usage noarch 4:2.01-2.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5157476Z perl-Scalar-List-Utils x86_64 4:1.56-459.amzn2023.0.2 amazonlinux 71 k 2025-05-07T19:42:49.5158059Z perl-SelectSaver noarch 1.02-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:49.5158612Z perl-Socket x86_64 4:2.032-1.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:49.5159129Z perl-Storable x86_64 1:3.21-458.amzn2023.0.2 amazonlinux 96 k 2025-05-07T19:42:49.5159664Z perl-Symbol noarch 1.08-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5160222Z perl-Term-ANSIColor noarch 5.01-459.amzn2023.0.2 amazonlinux 48 k 2025-05-07T19:42:49.5160796Z perl-Term-Cap noarch 1.17-458.amzn2023.0.2 amazonlinux 22 k 2025-05-07T19:42:49.5161349Z perl-TermReadKey x86_64 2.38-9.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5161919Z perl-Text-ParseWords noarch 3.30-458.amzn2023.0.2 amazonlinux 17 k 2025-05-07T19:42:49.5162647Z perl-Text-Tabs+Wrap noarch 2021.0726-1.amzn2023.0.1 amazonlinux 22 k 2025-05-07T19:42:49.5163531Z perl-Time-Local noarch 2:1.300-5.amzn2023.0.2 amazonlinux 34 k 2025-05-07T19:42:49.5164116Z perl-URI noarch 5.09-1.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:49.5164679Z perl-base noarch 2.27-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:49.5165244Z perl-constant noarch 1.33-459.amzn2023.0.2 amazonlinux 23 k 2025-05-07T19:42:49.5165820Z perl-if noarch 0.60.800-477.amzn2023.0.6 amazonlinux 14 k 2025-05-07T19:42:49.5166388Z perl-interpreter x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 71 k 2025-05-07T19:42:49.5166952Z perl-lib x86_64 0.65-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5167677Z perl-libnet noarch 3.13-2.amzn2023.0.2 amazonlinux 126 k 2025-05-07T19:42:49.5168214Z perl-libs x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 2.0 M 2025-05-07T19:42:49.5168890Z perl-mro x86_64 1.23-477.amzn2023.0.6 amazonlinux 29 k 2025-05-07T19:42:49.5169450Z perl-overload noarch 1.31-477.amzn2023.0.6 amazonlinux 46 k 2025-05-07T19:42:49.5170042Z perl-overloading noarch 0.02-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:49.5170640Z perl-parent noarch 1:0.238-458.amzn2023.0.2 amazonlinux 14 k 2025-05-07T19:42:49.5171218Z perl-podlators noarch 1:4.14-458.amzn2023.0.2 amazonlinux 112 k 2025-05-07T19:42:49.5171801Z perl-subs noarch 1.03-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:49.5172360Z perl-vars noarch 1.05-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:49.5172900Z shadow-utils x86_64 2:4.9-12.amzn2023.0.4 amazonlinux 1.1 M 2025-05-07T19:42:49.5173467Z systemd-libs x86_64 252.23-3.amzn2023 amazonlinux 613 k 2025-05-07T19:42:49.5174002Z util-linux x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 2.2 M 2025-05-07T19:42:49.5174558Z util-linux-core x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 432 k 2025-05-07T19:42:49.5175003Z Installing weak dependencies: 2025-05-07T19:42:49.5175479Z nano-default-editor noarch 8.3-1.amzn2023 amazonlinux 10 k 2025-05-07T19:42:49.5176101Z perl-IO-Socket-IP noarch 0.41-3.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5176701Z perl-IO-Socket-SSL noarch 2.075-1.amzn2023.0.2 amazonlinux 218 k 2025-05-07T19:42:49.5177317Z perl-Mozilla-CA noarch 20200520-4.amzn2023.0.2 amazonlinux 13 k 2025-05-07T19:42:49.5177883Z perl-NDBM_File x86_64 1.15-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:49.5178471Z sudo-python-plugin x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 56 k 2025-05-07T19:42:49.5178829Z 2025-05-07T19:42:49.5179064Z Transaction Summary 2025-05-07T19:42:49.5179329Z ======================================================================================== 2025-05-07T19:42:49.5179655Z Install 107 Packages 2025-05-07T19:42:49.5179798Z 2025-05-07T19:42:49.5179942Z Total download size: 38 M 2025-05-07T19:42:49.5180211Z Installed size: 151 M 2025-05-07T19:42:49.5180443Z Downloading Packages: 2025-05-07T19:42:49.8178708Z (1/107): cracklib-2.9.6-27.amzn2023.0.2.x86_64. 4.3 MB/s | 82 kB 00:00 2025-05-07T19:42:49.8293073Z (2/107): cyrus-sasl-lib-2.1.27-18.amzn2023.0.3. 26 MB/s | 786 kB 00:00 2025-05-07T19:42:49.8346067Z (3/107): elfutils-debuginfod-client-0.188-3.amz 2.5 MB/s | 41 kB 00:00 2025-05-07T19:42:49.8445662Z (4/107): findutils-4.8.0-2.amzn2023.0.2.x86_64. 37 MB/s | 539 kB 00:00 2025-05-07T19:42:49.8706797Z (5/107): binutils-2.41-50.amzn2023.0.3.x86_64.r 74 MB/s | 5.3 MB 00:00 2025-05-07T19:42:49.8724275Z (6/107): git-2.47.1-1.amzn2023.0.2.x86_64.rpm 1.7 MB/s | 54 kB 00:00 2025-05-07T19:42:49.9084408Z (7/107): git-core-2.47.1-1.amzn2023.0.2.x86_64. 76 MB/s | 4.7 MB 00:00 2025-05-07T19:42:49.9179014Z (8/107): gnutls-3.8.3-6.amzn2023.0.1.x86_64.rpm 28 MB/s | 1.1 MB 00:00 2025-05-07T19:42:49.9315445Z (9/107): git-core-doc-2.47.1-1.amzn2023.0.2.noa 53 MB/s | 2.8 MB 00:00 2025-05-07T19:42:49.9390104Z (10/107): groff-base-1.22.4-7.amzn2023.0.2.x86_ 38 MB/s | 1.0 MB 00:00 2025-05-07T19:42:49.9409585Z (11/107): gzip-1.12-1.amzn2023.0.1.x86_64.rpm 7.0 MB/s | 160 kB 00:00 2025-05-07T19:42:49.9582463Z (12/107): hwdata-0.384-1.amzn2023.0.3.noarch.rp 65 MB/s | 1.6 MB 00:00 2025-05-07T19:42:49.9597095Z (13/107): kmod-libs-29-2.amzn2023.0.5.x86_64.rp 3.8 MB/s | 62 kB 00:00 2025-05-07T19:42:49.9614487Z (14/107): jansson-2.14-0.amzn2023.x86_64.rpm 2.5 MB/s | 46 kB 00:00 2025-05-07T19:42:49.9663992Z (15/107): less-608-2.amzn2023.0.2.x86_64.rpm 27 MB/s | 168 kB 00:00 2025-05-07T19:42:49.9682697Z (16/107): libcbor-0.7.0-3.amzn2023.0.2.x86_64.r 7.0 MB/s | 57 kB 00:00 2025-05-07T19:42:49.9742708Z (17/107): libdb-5.3.28-49.amzn2023.0.2.x86_64.r 60 MB/s | 756 kB 00:00 2025-05-07T19:42:49.9760507Z (18/107): libeconf-0.4.0-1.amzn2023.0.3.x86_64. 3.0 MB/s | 28 kB 00:00 2025-05-07T19:42:49.9785410Z (19/107): libedit-3.1-38.20210714cvs.amzn2023.0 11 MB/s | 108 kB 00:00 2025-05-07T19:42:49.9812916Z (20/107): libfdisk-2.37.4-1.amzn2023.0.4.x86_64 22 MB/s | 153 kB 00:00 2025-05-07T19:42:49.9840015Z (21/107): libfido2-1.10.0-2.amzn2023.0.2.x86_64 13 MB/s | 95 kB 00:00 2025-05-07T19:42:49.9861212Z (22/107): libmetalink-0.1.3-14.amzn2023.0.2.x86 4.6 MB/s | 31 kB 00:00 2025-05-07T19:42:49.9884889Z (23/107): libpwquality-1.4.4-6.amzn2023.0.2.x86 16 MB/s | 106 kB 00:00 2025-05-07T19:42:49.9912515Z (24/107): libsemanage-3.4-5.amzn2023.0.2.x86_64 17 MB/s | 121 kB 00:00 2025-05-07T19:42:49.9955996Z (25/107): libutempter-1.2.1-4.amzn2023.0.2.x86_ 3.0 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0007932Z (26/107): nano-8.3-1.amzn2023.x86_64.rpm 58 MB/s | 706 kB 00:00 2025-05-07T19:42:50.0033047Z (27/107): nano-default-editor-8.3-1.amzn2023.no 900 kB/s | 10 kB 00:00 2025-05-07T19:42:50.0065694Z (28/107): ncurses-6.2-4.20200222.amzn2023.0.6.x 40 MB/s | 394 kB 00:00 2025-05-07T19:42:50.0156365Z (29/107): nettle-3.10.1-1.amzn2023.0.1.x86_64.r 42 MB/s | 573 kB 00:00 2025-05-07T19:42:50.0199875Z (30/107): openldap-2.4.57-6.amzn2023.0.7.x86_64 19 MB/s | 256 kB 00:00 2025-05-07T19:42:50.0233628Z (31/107): openssh-8.7p1-8.amzn2023.0.14.x86_64. 27 MB/s | 454 kB 00:00 2025-05-07T19:42:50.0290643Z (32/107): openssh-clients-8.7p1-8.amzn2023.0.14 57 MB/s | 708 kB 00:00 2025-05-07T19:42:50.0340161Z (33/107): pam-1.5.1-8.amzn2023.0.4.x86_64.rpm 58 MB/s | 542 kB 00:00 2025-05-07T19:42:50.0357347Z (34/107): pciutils-3.7.0-3.amzn2023.0.2.x86_64. 8.4 MB/s | 93 kB 00:00 2025-05-07T19:42:50.0374449Z (35/107): pciutils-libs-3.7.0-3.amzn2023.0.2.x8 5.6 MB/s | 41 kB 00:00 2025-05-07T19:42:50.0423266Z (36/107): perl-AutoLoader-5.74-477.amzn2023.0.6 3.6 MB/s | 22 kB 00:00 2025-05-07T19:42:50.0452795Z (37/107): perl-B-1.80-477.amzn2023.0.6.x86_64.r 19 MB/s | 179 kB 00:00 2025-05-07T19:42:50.0468635Z (38/107): perl-Carp-1.50-458.amzn2023.0.2.noarc 3.3 MB/s | 29 kB 00:00 2025-05-07T19:42:50.0492714Z (39/107): perl-Class-Struct-0.66-477.amzn2023.0 3.4 MB/s | 22 kB 00:00 2025-05-07T19:42:50.0528002Z (40/107): perl-Digest-1.20-1.amzn2023.0.2.noarc 4.4 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0551361Z (41/107): perl-Data-Dumper-2.174-460.amzn2023.0 6.9 MB/s | 55 kB 00:00 2025-05-07T19:42:50.0564936Z (42/107): perl-Digest-MD5-2.58-2.amzn2023.0.2.x 5.3 MB/s | 36 kB 00:00 2025-05-07T19:42:50.0584244Z (43/107): perl-DynaLoader-1.47-477.amzn2023.0.6 5.3 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0620997Z (44/107): perl-Errno-1.30-477.amzn2023.0.6.x86_ 3.1 MB/s | 15 kB 00:00 2025-05-07T19:42:50.0731150Z (45/107): perl-Encode-3.15-462.amzn2023.0.2.x86 103 MB/s | 1.7 MB 00:00 2025-05-07T19:42:50.0753310Z (46/107): perl-Error-0.17029-5.amzn2023.0.2.noa 2.5 MB/s | 41 kB 00:00 2025-05-07T19:42:50.0764642Z (47/107): perl-Exporter-5.74-459.amzn2023.0.2.n 2.5 MB/s | 31 kB 00:00 2025-05-07T19:42:50.0785229Z (48/107): perl-Fcntl-1.13-477.amzn2023.0.6.x86_ 4.4 MB/s | 21 kB 00:00 2025-05-07T19:42:50.0820745Z (49/107): perl-File-Basename-2.85-477.amzn2023. 3.5 MB/s | 18 kB 00:00 2025-05-07T19:42:50.0829965Z (50/107): perl-File-Find-1.37-477.amzn2023.0.6. 4.2 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0851236Z (51/107): perl-File-Path-2.18-2.amzn2023.0.2.no 5.5 MB/s | 36 kB 00:00 2025-05-07T19:42:50.0900761Z (52/107): perl-File-Temp-0.231.100-2.amzn2023.0 10 MB/s | 60 kB 00:00 2025-05-07T19:42:50.0923263Z (53/107): perl-File-stat-1.09-477.amzn2023.0.6. 2.2 MB/s | 17 kB 00:00 2025-05-07T19:42:50.0938960Z (54/107): perl-FileHandle-2.03-477.amzn2023.0.6 2.0 MB/s | 16 kB 00:00 2025-05-07T19:42:50.0969589Z (55/107): perl-Getopt-Long-2.52-2.amzn2023.0.2. 9.0 MB/s | 60 kB 00:00 2025-05-07T19:42:50.0986125Z (56/107): perl-Getopt-Std-1.12-477.amzn2023.0.6 2.6 MB/s | 16 kB 00:00 2025-05-07T19:42:50.0998303Z (57/107): perl-Git-2.47.1-1.amzn2023.0.2.noarch 7.1 MB/s | 42 kB 00:00 2025-05-07T19:42:50.1020757Z (58/107): perl-HTTP-Tiny-0.078-1.amzn2023.0.3.n 12 MB/s | 56 kB 00:00 2025-05-07T19:42:50.1067993Z (59/107): perl-IO-1.43-477.amzn2023.0.6.x86_64. 14 MB/s | 87 kB 00:00 2025-05-07T19:42:50.1079371Z (60/107): perl-IO-Socket-IP-0.41-3.amzn2023.0.2 5.3 MB/s | 42 kB 00:00 2025-05-07T19:42:50.1118097Z (61/107): perl-IO-Socket-SSL-2.075-1.amzn2023.0 23 MB/s | 218 kB 00:00 2025-05-07T19:42:50.1133495Z (62/107): perl-IPC-Open3-1.21-477.amzn2023.0.6. 4.8 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1154863Z (63/107): perl-MIME-Base64-3.16-2.amzn2023.0.2. 4.7 MB/s | 31 kB 00:00 2025-05-07T19:42:50.1172709Z (64/107): perl-Mozilla-CA-20200520-4.amzn2023.0 2.4 MB/s | 13 kB 00:00 2025-05-07T19:42:50.1193364Z (65/107): perl-NDBM_File-1.15-477.amzn2023.0.6. 4.2 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1240822Z (66/107): perl-Net-SSLeay-1.94-1.amzn2023.0.1.x 46 MB/s | 392 kB 00:00 2025-05-07T19:42:50.1263710Z (67/107): perl-POSIX-1.94-477.amzn2023.0.6.x86_ 11 MB/s | 97 kB 00:00 2025-05-07T19:42:50.1284643Z (68/107): perl-PathTools-3.78-459.amzn2023.0.2. 9.6 MB/s | 85 kB 00:00 2025-05-07T19:42:50.1306823Z (69/107): perl-Pod-Escapes-1.07-458.amzn2023.0. 3.3 MB/s | 20 kB 00:00 2025-05-07T19:42:50.1349286Z (70/107): perl-Pod-Perldoc-3.28.01-459.amzn2023 15 MB/s | 84 kB 00:00 2025-05-07T19:42:50.1377370Z (71/107): perl-Pod-Simple-3.42-2.amzn2023.0.2.n 25 MB/s | 215 kB 00:00 2025-05-07T19:42:50.1424314Z (72/107): perl-Pod-Usage-2.01-2.amzn2023.0.2.no 3.6 MB/s | 41 kB 00:00 2025-05-07T19:42:50.1454474Z (73/107): perl-Scalar-List-Utils-1.56-459.amzn2 7.1 MB/s | 71 kB 00:00 2025-05-07T19:42:50.1473033Z (74/107): perl-SelectSaver-1.02-477.amzn2023.0. 1.3 MB/s | 12 kB 00:00 2025-05-07T19:42:50.1491338Z (75/107): perl-Socket-2.032-1.amzn2023.0.2.x86_ 8.2 MB/s | 55 kB 00:00 2025-05-07T19:42:50.1517905Z (76/107): perl-Storable-3.21-458.amzn2023.0.2.x 16 MB/s | 96 kB 00:00 2025-05-07T19:42:50.1546999Z (77/107): perl-Symbol-1.08-477.amzn2023.0.6.noa 3.1 MB/s | 15 kB 00:00 2025-05-07T19:42:50.1564772Z (78/107): perl-Term-ANSIColor-5.01-459.amzn2023 7.5 MB/s | 48 kB 00:00 2025-05-07T19:42:50.1578061Z (79/107): perl-Term-Cap-1.17-458.amzn2023.0.2.n 3.7 MB/s | 22 kB 00:00 2025-05-07T19:42:50.1620389Z (80/107): perl-TermReadKey-2.38-9.amzn2023.0.2. 5.1 MB/s | 36 kB 00:00 2025-05-07T19:42:50.1647322Z (81/107): perl-Text-Tabs+Wrap-2021.0726-1.amzn2 3.4 MB/s | 22 kB 00:00 2025-05-07T19:42:50.1670925Z (82/107): perl-Text-ParseWords-3.30-458.amzn202 1.6 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1684902Z (83/107): perl-Time-Local-1.300-5.amzn2023.0.2. 5.5 MB/s | 34 kB 00:00 2025-05-07T19:42:50.1734050Z (84/107): perl-URI-5.09-1.amzn2023.0.2.noarch.r 14 MB/s | 108 kB 00:00 2025-05-07T19:42:50.1755175Z (85/107): perl-constant-1.33-459.amzn2023.0.2.n 3.7 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1777968Z (86/107): perl-base-2.27-477.amzn2023.0.6.noarc 1.8 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1808898Z (87/107): perl-if-0.60.800-477.amzn2023.0.6.noa 2.0 MB/s | 14 kB 00:00 2025-05-07T19:42:50.1853823Z (88/107): perl-interpreter-5.32.1-477.amzn2023. 7.4 MB/s | 71 kB 00:00 2025-05-07T19:42:50.1869723Z (89/107): perl-lib-0.65-477.amzn2023.0.6.x86_64 1.8 MB/s | 15 kB 00:00 2025-05-07T19:42:50.1894440Z (90/107): perl-libnet-3.13-2.amzn2023.0.2.noarc 15 MB/s | 126 kB 00:00 2025-05-07T19:42:50.1980314Z (91/107): perl-mro-1.23-477.amzn2023.0.6.x86_64 2.7 MB/s | 29 kB 00:00 2025-05-07T19:42:50.2102883Z (92/107): perl-libs-5.32.1-477.amzn2023.0.6.x86 84 MB/s | 2.0 MB 00:00 2025-05-07T19:42:50.2114933Z (93/107): perl-overload-1.31-477.amzn2023.0.6.n 2.1 MB/s | 46 kB 00:00 2025-05-07T19:42:50.2135280Z (94/107): perl-overloading-0.02-477.amzn2023.0. 906 kB/s | 13 kB 00:00 2025-05-07T19:42:50.2200075Z (95/107): perl-parent-0.238-458.amzn2023.0.2.no 1.9 MB/s | 14 kB 00:00 2025-05-07T19:42:50.2233271Z (96/107): perl-podlators-4.14-458.amzn2023.0.2. 10 MB/s | 112 kB 00:00 2025-05-07T19:42:50.2243288Z (97/107): perl-subs-1.03-477.amzn2023.0.6.noarc 1.1 MB/s | 12 kB 00:00 2025-05-07T19:42:50.2262784Z (98/107): perl-vars-1.05-477.amzn2023.0.6.noarc 2.5 MB/s | 13 kB 00:00 2025-05-07T19:42:50.2383025Z (99/107): sudo-1.9.15-1.p5.amzn2023.0.1.x86_64. 96 MB/s | 1.3 MB 00:00 2025-05-07T19:42:50.2469652Z (100/107): shadow-utils-4.9-12.amzn2023.0.4.x86 51 MB/s | 1.1 MB 00:00 2025-05-07T19:42:50.2480670Z (101/107): sudo-python-plugin-1.9.15-1.p5.amzn2 2.5 MB/s | 56 kB 00:00 2025-05-07T19:42:50.2533098Z (102/107): systemd-libs-252.23-3.amzn2023.x86_6 45 MB/s | 613 kB 00:00 2025-05-07T19:42:50.2604920Z (103/107): tar-1.34-1.amzn2023.0.4.x86_64.rpm 81 MB/s | 879 kB 00:00 2025-05-07T19:42:50.2752757Z (104/107): util-linux-2.37.4-1.amzn2023.0.4.x86 85 MB/s | 2.2 MB 00:00 2025-05-07T19:42:50.2786490Z (105/107): util-linux-core-2.37.4-1.amzn2023.0. 17 MB/s | 432 kB 00:00 2025-05-07T19:42:50.2842607Z (106/107): wget-1.21.3-1.amzn2023.0.4.x86_64.rp 37 MB/s | 779 kB 00:00 2025-05-07T19:42:50.2853353Z (107/107): which-2.21-26.amzn2023.0.2.x86_64.rp 7.4 MB/s | 42 kB 00:00 2025-05-07T19:42:50.2870962Z -------------------------------------------------------------------------------- 2025-05-07T19:42:50.2872266Z Total 48 MB/s | 38 MB 00:00 2025-05-07T19:42:51.3524754Z Running transaction check 2025-05-07T19:42:51.4001483Z Transaction check succeeded. 2025-05-07T19:42:51.4001905Z Running transaction test 2025-05-07T19:42:51.7731537Z Transaction test succeeded. 2025-05-07T19:42:51.7734300Z Running transaction 2025-05-07T19:42:52.5205252Z Preparing : 1/1 2025-05-07T19:42:52.5358356Z Installing : systemd-libs-252.23-3.amzn2023.x86_64 1/107 2025-05-07T19:42:52.5595749Z Installing : nettle-3.10.1-1.amzn2023.0.1.x86_64 2/107 2025-05-07T19:42:52.5791020Z Installing : gnutls-3.8.3-6.amzn2023.0.1.x86_64 3/107 2025-05-07T19:42:52.5834842Z Installing : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:52.5911113Z Running scriptlet: util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:52.5998449Z Installing : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:52.6276760Z Installing : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 6/107 2025-05-07T19:42:52.6325585Z Installing : nano-8.3-1.amzn2023.x86_64 7/107 2025-05-07T19:42:52.6375898Z Installing : nano-default-editor-8.3-1.amzn2023.noarch 8/107 2025-05-07T19:42:52.6889499Z Installing : libsemanage-3.4-5.amzn2023.0.2.x86_64 9/107 2025-05-07T19:42:52.6956538Z Installing : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 10/107 2025-05-07T19:42:52.7232531Z Running scriptlet: libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:52.7282695Z Installing : libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:52.7331975Z Installing : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 12/107 2025-05-07T19:42:52.7385917Z Installing : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 13/107 2025-05-07T19:42:52.7430092Z Installing : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 14/107 2025-05-07T19:42:52.7557186Z Installing : libeconf-0.4.0-1.amzn2023.0.3.x86_64 15/107 2025-05-07T19:42:52.7600545Z Installing : libdb-5.3.28-49.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:52.7644557Z Installing : libcbor-0.7.0-3.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:52.7711753Z Installing : libfido2-1.10.0-2.amzn2023.0.2.x86_64 18/107 2025-05-07T19:42:52.7758512Z Installing : less-608-2.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:52.7795328Z Installing : kmod-libs-29-2.amzn2023.0.5.x86_64 20/107 2025-05-07T19:42:52.8218499Z Installing : jansson-2.14-0.amzn2023.x86_64 21/107 2025-05-07T19:42:52.8291107Z Installing : hwdata-0.384-1.amzn2023.0.3.noarch 22/107 2025-05-07T19:42:52.8426288Z Installing : gzip-1.12-1.amzn2023.0.1.x86_64 23/107 2025-05-07T19:42:52.8844576Z Installing : cracklib-2.9.6-27.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:52.9004369Z Installing : pam-1.5.1-8.amzn2023.0.4.x86_64 25/107 2025-05-07T19:42:52.9824863Z Installing : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 26/107 2025-05-07T19:42:52.9825480Z Installing : util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:52.9825980Z warning: /etc/adjtime created as /etc/adjtime.rpmnew 2025-05-07T19:42:52.9826239Z 2025-05-07T19:42:53.0010333Z Running scriptlet: util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:53.0285316Z Running scriptlet: openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:53.0463377Z Installing : openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:53.0513666Z Installing : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:53.1647501Z Running scriptlet: openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:53.3161425Z Installing : git-core-2.47.1-1.amzn2023.0.2.x86_64 30/107 2025-05-07T19:42:53.3285209Z Installing : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 31/107 2025-05-07T19:42:53.3717270Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.3797333Z Installing : groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.3872138Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.3945800Z Installing : perl-Digest-1.20-1.amzn2023.0.2.noarch 33/107 2025-05-07T19:42:53.4030176Z Installing : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:53.4086726Z Installing : perl-B-1.80-477.amzn2023.0.6.x86_64 35/107 2025-05-07T19:42:53.4130700Z Installing : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:53.4193713Z Installing : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 37/107 2025-05-07T19:42:53.4282915Z Installing : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 38/107 2025-05-07T19:42:53.4389819Z Installing : perl-libnet-3.13-2.amzn2023.0.2.noarch 39/107 2025-05-07T19:42:53.4507365Z Installing : perl-base-2.27-477.amzn2023.0.6.noarch 40/107 2025-05-07T19:42:53.4725824Z Installing : perl-URI-5.09-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:53.4811807Z Installing : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 42/107 2025-05-07T19:42:53.4863572Z Installing : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 43/107 2025-05-07T19:42:53.4909467Z Installing : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 44/107 2025-05-07T19:42:53.4963942Z Installing : perl-if-0.60.800-477.amzn2023.0.6.noarch 45/107 2025-05-07T19:42:53.5020475Z Installing : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:53.5075535Z Installing : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:53.5165124Z Installing : perl-File-Path-2.18-2.amzn2023.0.2.noarch 48/107 2025-05-07T19:42:53.5233402Z Installing : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 49/107 2025-05-07T19:42:53.5279699Z Installing : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 50/107 2025-05-07T19:42:53.5343696Z Installing : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 51/107 2025-05-07T19:42:53.5401954Z Installing : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 52/107 2025-05-07T19:42:53.5455791Z Installing : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 53/107 2025-05-07T19:42:53.5500951Z Installing : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:53.5559929Z Installing : perl-subs-1.03-477.amzn2023.0.6.noarch 55/107 2025-05-07T19:42:53.5631342Z Installing : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 56/107 2025-05-07T19:42:53.5685907Z Installing : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 57/107 2025-05-07T19:42:53.5797297Z Installing : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 58/107 2025-05-07T19:42:53.5880836Z Installing : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 59/107 2025-05-07T19:42:53.5947654Z Installing : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 60/107 2025-05-07T19:42:53.5996100Z Installing : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 61/107 2025-05-07T19:42:53.6040118Z Installing : perl-Symbol-1.08-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:53.6115582Z Installing : perl-File-stat-1.09-477.amzn2023.0.6.noarch 63/107 2025-05-07T19:42:53.6210146Z Installing : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:53.6285073Z Installing : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 65/107 2025-05-07T19:42:53.6342701Z Installing : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 66/107 2025-05-07T19:42:53.6396676Z Installing : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 67/107 2025-05-07T19:42:53.6473842Z Installing : perl-mro-1.23-477.amzn2023.0.6.x86_64 68/107 2025-05-07T19:42:53.6541137Z Installing : perl-IO-1.43-477.amzn2023.0.6.x86_64 69/107 2025-05-07T19:42:53.6600674Z Installing : perl-overloading-0.02-477.amzn2023.0.6.noarch 70/107 2025-05-07T19:42:53.6670061Z Installing : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:53.6720131Z Installing : perl-Errno-1.30-477.amzn2023.0.6.x86_64 72/107 2025-05-07T19:42:53.6769358Z Installing : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 73/107 2025-05-07T19:42:53.6831452Z Installing : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:53.6911096Z Installing : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:53.6993298Z Installing : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 76/107 2025-05-07T19:42:53.7055327Z Installing : perl-constant-1.33-459.amzn2023.0.2.noarch 77/107 2025-05-07T19:42:53.7123167Z Installing : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 78/107 2025-05-07T19:42:53.7169802Z Installing : perl-overload-1.31-477.amzn2023.0.6.noarch 79/107 2025-05-07T19:42:53.7227813Z Installing : perl-parent-1:0.238-458.amzn2023.0.2.noarch 80/107 2025-05-07T19:42:53.7294870Z Installing : perl-vars-1.05-477.amzn2023.0.6.noarch 81/107 2025-05-07T19:42:53.7351946Z Installing : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 82/107 2025-05-07T19:42:53.7405065Z Installing : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 83/107 2025-05-07T19:42:53.7466255Z Installing : perl-Carp-1.50-458.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:53.7522223Z Installing : perl-Exporter-5.74-459.amzn2023.0.2.noarch 85/107 2025-05-07T19:42:53.7605712Z Installing : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 86/107 2025-05-07T19:42:53.8149092Z Installing : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 87/107 2025-05-07T19:42:53.9133302Z Installing : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 88/107 2025-05-07T19:42:53.9266252Z Installing : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:53.9345041Z Installing : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 90/107 2025-05-07T19:42:53.9411106Z Installing : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 91/107 2025-05-07T19:42:53.9479559Z Installing : perl-File-Find-1.37-477.amzn2023.0.6.noarch 92/107 2025-05-07T19:42:53.9548671Z Installing : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 93/107 2025-05-07T19:42:53.9609896Z Installing : perl-lib-0.65-477.amzn2023.0.6.x86_64 94/107 2025-05-07T19:42:53.9678308Z Installing : perl-Git-2.47.1-1.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:53.9758920Z Installing : git-2.47.1-1.amzn2023.0.2.x86_64 96/107 2025-05-07T19:42:53.9974760Z Installing : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 97/107 2025-05-07T19:42:54.0108710Z Installing : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 98/107 2025-05-07T19:42:54.0189879Z Installing : openldap-2.4.57-6.amzn2023.0.7.x86_64 99/107 2025-05-07T19:42:54.0596964Z Installing : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 100/107 2025-05-07T19:42:54.1843088Z Installing : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 101/107 2025-05-07T19:42:54.1930639Z Installing : binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:54.2053527Z Running scriptlet: binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:54.2365158Z Installing : pciutils-3.7.0-3.amzn2023.0.2.x86_64 103/107 2025-05-07T19:42:54.2464228Z Installing : wget-1.21.3-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:54.2721834Z Installing : which-2.21-26.amzn2023.0.2.x86_64 105/107 2025-05-07T19:42:54.2944486Z Installing : tar-2:1.34-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:54.3031569Z Installing : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:54.3156843Z Running scriptlet: pam-1.5.1-8.amzn2023.0.4.x86_64 107/107 2025-05-07T19:42:55.0914076Z Running scriptlet: findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:55.0914785Z Verifying : binutils-2.41-50.amzn2023.0.3.x86_64 1/107 2025-05-07T19:42:55.0915673Z Verifying : cracklib-2.9.6-27.amzn2023.0.2.x86_64 2/107 2025-05-07T19:42:55.0916377Z Verifying : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 3/107 2025-05-07T19:42:55.0916992Z Verifying : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 4/107 2025-05-07T19:42:55.0917748Z Verifying : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:55.0918358Z Verifying : git-2.47.1-1.amzn2023.0.2.x86_64 6/107 2025-05-07T19:42:55.0919003Z Verifying : git-core-2.47.1-1.amzn2023.0.2.x86_64 7/107 2025-05-07T19:42:55.0919620Z Verifying : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 8/107 2025-05-07T19:42:55.0920645Z Verifying : gnutls-3.8.3-6.amzn2023.0.1.x86_64 9/107 2025-05-07T19:42:55.0921344Z Verifying : groff-base-1.22.4-7.amzn2023.0.2.x86_64 10/107 2025-05-07T19:42:55.0921992Z Verifying : gzip-1.12-1.amzn2023.0.1.x86_64 11/107 2025-05-07T19:42:55.0922751Z Verifying : hwdata-0.384-1.amzn2023.0.3.noarch 12/107 2025-05-07T19:42:55.0923308Z Verifying : jansson-2.14-0.amzn2023.x86_64 13/107 2025-05-07T19:42:55.0924020Z Verifying : kmod-libs-29-2.amzn2023.0.5.x86_64 14/107 2025-05-07T19:42:55.0924642Z Verifying : less-608-2.amzn2023.0.2.x86_64 15/107 2025-05-07T19:42:55.0925230Z Verifying : libcbor-0.7.0-3.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:55.0925901Z Verifying : libdb-5.3.28-49.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:55.0926487Z Verifying : libeconf-0.4.0-1.amzn2023.0.3.x86_64 18/107 2025-05-07T19:42:55.0927115Z Verifying : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:55.0927792Z Verifying : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 20/107 2025-05-07T19:42:55.0928378Z Verifying : libfido2-1.10.0-2.amzn2023.0.2.x86_64 21/107 2025-05-07T19:42:55.0929053Z Verifying : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 22/107 2025-05-07T19:42:55.0929752Z Verifying : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 23/107 2025-05-07T19:42:55.0930413Z Verifying : libsemanage-3.4-5.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:55.0931105Z Verifying : libutempter-1.2.1-4.amzn2023.0.2.x86_64 25/107 2025-05-07T19:42:55.0931718Z Verifying : nano-8.3-1.amzn2023.x86_64 26/107 2025-05-07T19:42:55.0932368Z Verifying : nano-default-editor-8.3-1.amzn2023.noarch 27/107 2025-05-07T19:42:55.0932980Z Verifying : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 28/107 2025-05-07T19:42:55.0933663Z Verifying : nettle-3.10.1-1.amzn2023.0.1.x86_64 29/107 2025-05-07T19:42:55.0934298Z Verifying : openldap-2.4.57-6.amzn2023.0.7.x86_64 30/107 2025-05-07T19:42:55.0934861Z Verifying : openssh-8.7p1-8.amzn2023.0.14.x86_64 31/107 2025-05-07T19:42:55.0935581Z Verifying : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 32/107 2025-05-07T19:42:55.0936169Z Verifying : pam-1.5.1-8.amzn2023.0.4.x86_64 33/107 2025-05-07T19:42:55.0936776Z Verifying : pciutils-3.7.0-3.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:55.0937517Z Verifying : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 35/107 2025-05-07T19:42:55.0938133Z Verifying : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:55.0938915Z Verifying : perl-B-1.80-477.amzn2023.0.6.x86_64 37/107 2025-05-07T19:42:55.0939551Z Verifying : perl-Carp-1.50-458.amzn2023.0.2.noarch 38/107 2025-05-07T19:42:55.0940231Z Verifying : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 39/107 2025-05-07T19:42:55.0940853Z Verifying : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 40/107 2025-05-07T19:42:55.0941540Z Verifying : perl-Digest-1.20-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:55.0942202Z Verifying : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 42/107 2025-05-07T19:42:55.0942829Z Verifying : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 43/107 2025-05-07T19:42:55.0943504Z Verifying : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 44/107 2025-05-07T19:42:55.0944188Z Verifying : perl-Errno-1.30-477.amzn2023.0.6.x86_64 45/107 2025-05-07T19:42:55.0944837Z Verifying : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:55.0945601Z Verifying : perl-Exporter-5.74-459.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:55.0946201Z Verifying : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 48/107 2025-05-07T19:42:55.0947008Z Verifying : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 49/107 2025-05-07T19:42:55.0947726Z Verifying : perl-File-Find-1.37-477.amzn2023.0.6.noarch 50/107 2025-05-07T19:42:55.0948334Z Verifying : perl-File-Path-2.18-2.amzn2023.0.2.noarch 51/107 2025-05-07T19:42:55.0948957Z Verifying : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 52/107 2025-05-07T19:42:55.0949608Z Verifying : perl-File-stat-1.09-477.amzn2023.0.6.noarch 53/107 2025-05-07T19:42:55.0950270Z Verifying : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:55.0950884Z Verifying : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 55/107 2025-05-07T19:42:55.0951574Z Verifying : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 56/107 2025-05-07T19:42:55.0952218Z Verifying : perl-Git-2.47.1-1.amzn2023.0.2.noarch 57/107 2025-05-07T19:42:55.0952815Z Verifying : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 58/107 2025-05-07T19:42:55.0953360Z Verifying : perl-IO-1.43-477.amzn2023.0.6.x86_64 59/107 2025-05-07T19:42:55.0953895Z Verifying : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 60/107 2025-05-07T19:42:55.0954464Z Verifying : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 61/107 2025-05-07T19:42:55.0955008Z Verifying : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:55.0955560Z Verifying : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 63/107 2025-05-07T19:42:55.0956107Z Verifying : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:55.0956670Z Verifying : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 65/107 2025-05-07T19:42:55.0957204Z Verifying : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 66/107 2025-05-07T19:42:55.0957754Z Verifying : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 67/107 2025-05-07T19:42:55.0958306Z Verifying : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 68/107 2025-05-07T19:42:55.0958848Z Verifying : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 69/107 2025-05-07T19:42:55.0959411Z Verifying : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 70/107 2025-05-07T19:42:55.0959960Z Verifying : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:55.0960511Z Verifying : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 72/107 2025-05-07T19:42:55.0961062Z Verifying : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 73/107 2025-05-07T19:42:55.0961750Z Verifying : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:55.0962393Z Verifying : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:55.0962912Z Verifying : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 76/107 2025-05-07T19:42:55.0963463Z Verifying : perl-Symbol-1.08-477.amzn2023.0.6.noarch 77/107 2025-05-07T19:42:55.0964009Z Verifying : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 78/107 2025-05-07T19:42:55.0964575Z Verifying : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 79/107 2025-05-07T19:42:55.0965138Z Verifying : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 80/107 2025-05-07T19:42:55.0965694Z Verifying : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 81/107 2025-05-07T19:42:55.0966275Z Verifying : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 82/107 2025-05-07T19:42:55.0966811Z Verifying : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 83/107 2025-05-07T19:42:55.0967706Z Verifying : perl-URI-5.09-1.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:55.0968242Z Verifying : perl-base-2.27-477.amzn2023.0.6.noarch 85/107 2025-05-07T19:42:55.0968772Z Verifying : perl-constant-1.33-459.amzn2023.0.2.noarch 86/107 2025-05-07T19:42:55.0969317Z Verifying : perl-if-0.60.800-477.amzn2023.0.6.noarch 87/107 2025-05-07T19:42:55.0969836Z Verifying : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 88/107 2025-05-07T19:42:55.0970373Z Verifying : perl-lib-0.65-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:55.0970891Z Verifying : perl-libnet-3.13-2.amzn2023.0.2.noarch 90/107 2025-05-07T19:42:55.0971429Z Verifying : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 91/107 2025-05-07T19:42:55.0971946Z Verifying : perl-mro-1.23-477.amzn2023.0.6.x86_64 92/107 2025-05-07T19:42:55.0972479Z Verifying : perl-overload-1.31-477.amzn2023.0.6.noarch 93/107 2025-05-07T19:42:55.0973045Z Verifying : perl-overloading-0.02-477.amzn2023.0.6.noarch 94/107 2025-05-07T19:42:55.0973581Z Verifying : perl-parent-1:0.238-458.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:55.0974121Z Verifying : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 96/107 2025-05-07T19:42:55.0974669Z Verifying : perl-subs-1.03-477.amzn2023.0.6.noarch 97/107 2025-05-07T19:42:55.0975197Z Verifying : perl-vars-1.05-477.amzn2023.0.6.noarch 98/107 2025-05-07T19:42:55.0975733Z Verifying : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 99/107 2025-05-07T19:42:55.0976238Z Verifying : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 100/107 2025-05-07T19:42:55.0976775Z Verifying : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 101/107 2025-05-07T19:42:55.0977322Z Verifying : systemd-libs-252.23-3.amzn2023.x86_64 102/107 2025-05-07T19:42:55.0977847Z Verifying : tar-2:1.34-1.amzn2023.0.4.x86_64 103/107 2025-05-07T19:42:55.0978361Z Verifying : util-linux-2.37.4-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:55.0978886Z Verifying : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 105/107 2025-05-07T19:42:55.0979416Z Verifying : wget-1.21.3-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:55.1966791Z Verifying : which-2.21-26.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:55.1967334Z 2025-05-07T19:42:55.1967421Z Installed: 2025-05-07T19:42:55.1967751Z binutils-2.41-50.amzn2023.0.3.x86_64 2025-05-07T19:42:55.1968294Z cracklib-2.9.6-27.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1968838Z cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 2025-05-07T19:42:55.1969663Z elfutils-debuginfod-client-0.188-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1970247Z findutils-1:4.8.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1970741Z git-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1971245Z git-core-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1971768Z git-core-doc-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1972299Z gnutls-3.8.3-6.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1972822Z groff-base-1.22.4-7.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1973322Z gzip-1.12-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1973835Z hwdata-0.384-1.amzn2023.0.3.noarch 2025-05-07T19:42:55.1974433Z jansson-2.14-0.amzn2023.x86_64 2025-05-07T19:42:55.1974957Z kmod-libs-29-2.amzn2023.0.5.x86_64 2025-05-07T19:42:55.1975470Z less-608-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1975964Z libcbor-0.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1976479Z libdb-5.3.28-49.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1976981Z libeconf-0.4.0-1.amzn2023.0.3.x86_64 2025-05-07T19:42:55.1977537Z libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1978070Z libfdisk-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1978596Z libfido2-1.10.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1979134Z libmetalink-0.1.3-14.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1979682Z libpwquality-1.4.4-6.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1980236Z libsemanage-3.4-5.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1980770Z libutempter-1.2.1-4.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1981288Z nano-8.3-1.amzn2023.x86_64 2025-05-07T19:42:55.1981927Z nano-default-editor-8.3-1.amzn2023.noarch 2025-05-07T19:42:55.1982569Z ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1983060Z nettle-3.10.1-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1983536Z openldap-2.4.57-6.amzn2023.0.7.x86_64 2025-05-07T19:42:55.1984030Z openssh-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:55.1984539Z openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:55.1985031Z pam-1.5.1-8.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1985503Z pciutils-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1985991Z pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1986523Z perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1987021Z perl-B-1.80-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1987521Z perl-Carp-1.50-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1988038Z perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1988583Z perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1989189Z perl-Digest-1.20-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1989695Z perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1990228Z perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1990733Z perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1991239Z perl-Errno-1.30-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1991748Z perl-Error-1:0.17029-5.amzn2023.0.2.noarch 2025-05-07T19:42:55.1992255Z perl-Exporter-5.74-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.1992777Z perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1993293Z perl-File-Basename-2.85-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1993847Z perl-File-Find-1.37-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1994441Z perl-File-Path-2.18-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1995007Z perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1995562Z perl-File-stat-1.09-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1996111Z perl-FileHandle-2.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1996684Z perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1997227Z perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1997791Z perl-Git-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1998351Z perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 2025-05-07T19:42:55.1998876Z perl-IO-1.43-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1999427Z perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 2025-05-07T19:42:55.1999980Z perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2000553Z perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2001090Z perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2001666Z perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 2025-05-07T19:42:55.2002224Z perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2002818Z perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2003579Z perl-POSIX-1.94-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2004153Z perl-PathTools-3.78-459.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2004772Z perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2005363Z perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 2025-05-07T19:42:55.2005968Z perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2006558Z perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2007128Z perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2007752Z perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2008329Z perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2008911Z perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2009512Z perl-Symbol-1.08-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2010175Z perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.2010773Z perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2011328Z perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2011916Z perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2012501Z perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noarch 2025-05-07T19:42:55.2013075Z perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 2025-05-07T19:42:55.2013616Z perl-URI-5.09-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2014143Z perl-base-2.27-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2014703Z perl-constant-1.33-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.2015353Z perl-if-0.60.800-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2016026Z perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2016559Z perl-lib-0.65-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2017076Z perl-libnet-3.13-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2017603Z perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2018105Z perl-mro-1.23-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2018663Z perl-overload-1.31-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2019235Z perl-overloading-0.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2019778Z perl-parent-1:0.238-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2020322Z perl-podlators-1:4.14-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2020858Z perl-subs-1.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2021398Z perl-vars-1.05-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2021915Z shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2022427Z sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2022961Z sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2023502Z systemd-libs-252.23-3.amzn2023.x86_64 2025-05-07T19:42:55.2024006Z tar-2:1.34-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2024491Z util-linux-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2025066Z util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2025574Z wget-1.21.3-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2026068Z which-2.21-26.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2026367Z 2025-05-07T19:42:55.2026469Z Complete! 2025-05-07T19:42:55.2787390Z ##[group]Run actions/checkout@v4 2025-05-07T19:42:55.2787739Z with: 2025-05-07T19:42:55.2787983Z submodules: true 2025-05-07T19:42:55.2788235Z repository: pytorch/FBGEMM 2025-05-07T19:42:55.2788738Z token: *** 2025-05-07T19:42:55.2788964Z ssh-strict: true 2025-05-07T19:42:55.2789221Z ssh-user: git 2025-05-07T19:42:55.2789457Z persist-credentials: true 2025-05-07T19:42:55.2789744Z clean: true 2025-05-07T19:42:55.2790008Z sparse-checkout-cone-mode: true 2025-05-07T19:42:55.2790293Z fetch-depth: 1 2025-05-07T19:42:55.2790546Z fetch-tags: false 2025-05-07T19:42:55.2790777Z show-progress: true 2025-05-07T19:42:55.2791035Z lfs: false 2025-05-07T19:42:55.2791261Z set-safe-directory: true 2025-05-07T19:42:55.2791743Z env: 2025-05-07T19:42:55.2791972Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:55.2792323Z BUILD_ENV: build_binary 2025-05-07T19:42:55.2792581Z BUILD_TARGET: genai 2025-05-07T19:42:55.2792935Z BUILD_VARIANT: cuda 2025-05-07T19:42:55.2793267Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:55.2793520Z ##[endgroup] 2025-05-07T19:42:55.2840053Z ##[command]/usr/bin/docker exec 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T19:42:55.6019291Z Syncing repository: pytorch/FBGEMM 2025-05-07T19:42:55.6020659Z ##[group]Getting Git version info 2025-05-07T19:42:55.6020990Z Working directory is '/__w/FBGEMM/FBGEMM' 2025-05-07T19:42:55.6021522Z [command]/usr/bin/git version 2025-05-07T19:42:55.6021788Z git version 2.47.1 2025-05-07T19:42:55.6022727Z ##[endgroup] 2025-05-07T19:42:55.6026680Z Temporarily overriding HOME='/__w/_temp/73c09e45-e534-4f6d-b7a9-6686e501884e' before making global git config changes 2025-05-07T19:42:55.6027508Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T19:42:55.6030639Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T19:42:55.6070083Z [command]/usr/bin/git config --local --get remote.origin.url 2025-05-07T19:42:55.6088540Z https://github.com/pytorch/FBGEMM 2025-05-07T19:42:55.6101536Z ##[group]Removing previously created refs, to avoid conflicts 2025-05-07T19:42:55.6104584Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-05-07T19:42:55.6124651Z HEAD 2025-05-07T19:42:55.6156488Z ##[endgroup] 2025-05-07T19:42:55.6157005Z [command]/usr/bin/git submodule status 2025-05-07T19:42:55.6491904Z e5d7c0bd5d9aec44d68830187138149e6a8c4e32 external/asmjit (e5d7c0b) 2025-05-07T19:42:55.6553248Z 4a61bdd4bd4ed730e078aebc7c0fcf046ff29406 external/composable_kernel (4a61bdd) 2025-05-07T19:42:55.6627171Z 6543fec09b2f04ac4a666882998b534afc9c1349 external/cpuinfo (6543fec) 2025-05-07T19:42:55.6688786Z 3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3 external/cutlass (3ed8d2e) 2025-05-07T19:42:55.6765651Z f8d7d77c06936315286eb55f8de22cd23c188571 external/googletest (f8d7d77) 2025-05-07T19:42:55.6837808Z a4337c69fe0e2552a7b7b0669178926beeed828c external/hipify_torch (heads/master) 2025-05-07T19:42:55.6912706Z 9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03 external/json (9cca280) 2025-05-07T19:42:55.6918739Z ##[group]Cleaning the repository 2025-05-07T19:42:55.6920243Z [command]/usr/bin/git clean -ffdx 2025-05-07T19:42:57.1028028Z Removing build_only/ 2025-05-07T19:42:57.1028349Z Removing collect_env.py 2025-05-07T19:42:57.1028632Z Removing fbgemm_gpu/_skbuild/ 2025-05-07T19:42:57.1029066Z Removing fbgemm_gpu/codegen/genscript/__pycache__/ 2025-05-07T19:42:57.1029664Z Removing fbgemm_gpu/dist/ 2025-05-07T19:42:57.1029986Z Removing fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:42:57.1030369Z Removing fbgemm_gpu/fbgemm_gpu_nightly.egg-info/ 2025-05-07T19:42:57.1037975Z [command]/usr/bin/git reset --hard HEAD 2025-05-07T19:42:57.2061617Z HEAD is now at 16f8549 Merge 57b27f098bbd767561f043c11657f32c2b505ef0 into 7a44073f73d1a274e4abb4e8f508434232380135 2025-05-07T19:42:57.2065430Z ##[endgroup] 2025-05-07T19:42:57.2066684Z ##[group]Disabling automatic garbage collection 2025-05-07T19:42:57.2071034Z [command]/usr/bin/git config --local gc.auto 0 2025-05-07T19:42:57.2101658Z ##[endgroup] 2025-05-07T19:42:57.2102739Z ##[group]Setting up auth 2025-05-07T19:42:57.2107866Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T19:42:57.2131326Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T19:42:57.2405785Z Entering 'external/asmjit' 2025-05-07T19:42:57.2483261Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.2549879Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.2605425Z Entering 'external/cutlass' 2025-05-07T19:42:57.2678854Z Entering 'external/googletest' 2025-05-07T19:42:57.2733049Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.2797262Z Entering 'external/json' 2025-05-07T19:42:57.2874476Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T19:42:57.2898478Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T19:42:57.3225733Z Entering 'external/asmjit' 2025-05-07T19:42:57.3277071Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.3354171Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.3417303Z Entering 'external/cutlass' 2025-05-07T19:42:57.3487084Z Entering 'external/googletest' 2025-05-07T19:42:57.3543453Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.3600384Z Entering 'external/json' 2025-05-07T19:42:57.3685149Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:57.3744520Z ##[endgroup] 2025-05-07T19:42:57.3745048Z ##[group]Fetching the repository 2025-05-07T19:42:57.3755023Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +a2f4c52051596e74bc8c16e3d2867a4ecdd271e0:refs/remotes/pull/4066/merge 2025-05-07T19:42:57.5863152Z From https://github.com/pytorch/FBGEMM 2025-05-07T19:42:57.5863734Z * [new ref] a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 -> pull/4066/merge 2025-05-07T19:42:57.5880072Z ##[endgroup] 2025-05-07T19:42:57.5880561Z ##[group]Determining the checkout info 2025-05-07T19:42:57.5881083Z ##[endgroup] 2025-05-07T19:42:57.5884131Z [command]/usr/bin/git sparse-checkout disable 2025-05-07T19:42:57.6418065Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-05-07T19:42:57.6419773Z ##[group]Checking out the ref 2025-05-07T19:42:57.6420257Z [command]/usr/bin/git checkout --progress --force refs/remotes/pull/4066/merge 2025-05-07T19:42:57.7385873Z Previous HEAD position was 16f8549 Merge 57b27f098bbd767561f043c11657f32c2b505ef0 into 7a44073f73d1a274e4abb4e8f508434232380135 2025-05-07T19:42:57.7388866Z HEAD is now at a2f4c52 Merge 6060cd4b5f971680caecdcc657faccb5720d1c3e into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:57.7390460Z ##[endgroup] 2025-05-07T19:42:57.7390943Z ##[group]Setting up auth for fetching submodules 2025-05-07T19:42:57.7393182Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:57.7435185Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-05-07T19:42:57.7455185Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-05-07T19:42:57.7480104Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-05-07T19:42:57.7502946Z ##[endgroup] 2025-05-07T19:42:57.7504044Z ##[group]Fetching submodules 2025-05-07T19:42:57.7504914Z [command]/usr/bin/git submodule sync 2025-05-07T19:42:57.7805479Z Synchronizing submodule url for 'external/asmjit' 2025-05-07T19:42:57.7805989Z Synchronizing submodule url for 'external/composable_kernel' 2025-05-07T19:42:57.7806945Z Synchronizing submodule url for 'external/cpuinfo' 2025-05-07T19:42:57.7807387Z Synchronizing submodule url for 'external/cutlass' 2025-05-07T19:42:57.7807792Z Synchronizing submodule url for 'external/googletest' 2025-05-07T19:42:57.7808246Z Synchronizing submodule url for 'external/hipify_torch' 2025-05-07T19:42:57.7808645Z Synchronizing submodule url for 'external/json' 2025-05-07T19:42:57.7811562Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 2025-05-07T19:42:57.8523830Z Submodule path 'external/asmjit': checked out 'e5d7c0bd5d9aec44d68830187138149e6a8c4e32' 2025-05-07T19:42:58.1047345Z Submodule path 'external/composable_kernel': checked out '4a61bdd4bd4ed730e078aebc7c0fcf046ff29406' 2025-05-07T19:42:58.1952407Z Submodule path 'external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-05-07T19:42:58.8614890Z Submodule path 'external/cutlass': checked out '3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3' 2025-05-07T19:42:58.9005582Z Submodule path 'external/googletest': checked out 'f8d7d77c06936315286eb55f8de22cd23c188571' 2025-05-07T19:42:59.4809305Z From https://github.com/ROCmSoftwarePlatform/hipify_torch 2025-05-07T19:42:59.4809872Z * branch 420084499c7c1e1c2d801922f40df202eac5f3a0 -> FETCH_HEAD 2025-05-07T19:42:59.4898645Z Submodule path 'external/hipify_torch': checked out '420084499c7c1e1c2d801922f40df202eac5f3a0' 2025-05-07T19:42:59.5952003Z Submodule path 'external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-05-07T19:42:59.5962397Z [command]/usr/bin/git submodule foreach git config --local gc.auto 0 2025-05-07T19:42:59.6245229Z Entering 'external/asmjit' 2025-05-07T19:42:59.6280805Z Entering 'external/composable_kernel' 2025-05-07T19:42:59.6309729Z Entering 'external/cpuinfo' 2025-05-07T19:42:59.6346524Z Entering 'external/cutlass' 2025-05-07T19:42:59.6373643Z Entering 'external/googletest' 2025-05-07T19:42:59.6406011Z Entering 'external/hipify_torch' 2025-05-07T19:42:59.6439633Z Entering 'external/json' 2025-05-07T19:42:59.6475278Z ##[endgroup] 2025-05-07T19:42:59.6476429Z ##[group]Persisting credentials for submodules 2025-05-07T19:42:59.6478198Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-05-07T19:42:59.6746078Z Entering 'external/asmjit' 2025-05-07T19:42:59.6788130Z url.https://github.com/.insteadof 2025-05-07T19:42:59.6788618Z url.https://github.com/.insteadof 2025-05-07T19:42:59.6823187Z Entering 'external/composable_kernel' 2025-05-07T19:42:59.6854083Z url.https://github.com/.insteadof 2025-05-07T19:42:59.6855109Z url.https://github.com/.insteadof 2025-05-07T19:42:59.6888934Z Entering 'external/cpuinfo' 2025-05-07T19:42:59.6932614Z url.https://github.com/.insteadof 2025-05-07T19:42:59.6932989Z url.https://github.com/.insteadof 2025-05-07T19:42:59.6968211Z Entering 'external/cutlass' 2025-05-07T19:42:59.7006764Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7007232Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7051720Z Entering 'external/googletest' 2025-05-07T19:42:59.7103982Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7105250Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7143046Z Entering 'external/hipify_torch' 2025-05-07T19:42:59.7187609Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7188030Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7223523Z Entering 'external/json' 2025-05-07T19:42:59.7254009Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7255041Z url.https://github.com/.insteadof 2025-05-07T19:42:59.7295954Z [command]/usr/bin/git submodule foreach sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-05-07T19:42:59.7561314Z Entering 'external/asmjit' 2025-05-07T19:42:59.7606165Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/asmjit/config remote.origin.url 2025-05-07T19:42:59.7606837Z Entering 'external/composable_kernel' 2025-05-07T19:42:59.7646891Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/composable_kernel/config remote.origin.url 2025-05-07T19:42:59.7648480Z Entering 'external/cpuinfo' 2025-05-07T19:42:59.7696819Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cpuinfo/config remote.origin.url 2025-05-07T19:42:59.7697470Z Entering 'external/cutlass' 2025-05-07T19:42:59.7755155Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cutlass/config remote.origin.url 2025-05-07T19:42:59.7757888Z Entering 'external/googletest' 2025-05-07T19:42:59.7812973Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/googletest/config remote.origin.url 2025-05-07T19:42:59.7816487Z Entering 'external/hipify_torch' 2025-05-07T19:42:59.7866208Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/hipify_torch/config remote.origin.url 2025-05-07T19:42:59.7868139Z Entering 'external/json' 2025-05-07T19:42:59.7926153Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/json/config remote.origin.url 2025-05-07T19:42:59.8044949Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-05-07T19:42:59.8357949Z Entering 'external/asmjit' 2025-05-07T19:42:59.8384665Z Entering 'external/composable_kernel' 2025-05-07T19:42:59.8406244Z Entering 'external/cpuinfo' 2025-05-07T19:42:59.8434151Z Entering 'external/cutlass' 2025-05-07T19:42:59.8465457Z Entering 'external/googletest' 2025-05-07T19:42:59.8499588Z Entering 'external/hipify_torch' 2025-05-07T19:42:59.8532461Z Entering 'external/json' 2025-05-07T19:42:59.8568415Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-05-07T19:42:59.8862189Z Entering 'external/asmjit' 2025-05-07T19:42:59.8885495Z Entering 'external/composable_kernel' 2025-05-07T19:42:59.8912163Z Entering 'external/cpuinfo' 2025-05-07T19:42:59.8932428Z Entering 'external/cutlass' 2025-05-07T19:42:59.8960460Z Entering 'external/googletest' 2025-05-07T19:42:59.8994395Z Entering 'external/hipify_torch' 2025-05-07T19:42:59.9026683Z Entering 'external/json' 2025-05-07T19:42:59.9070523Z ##[endgroup] 2025-05-07T19:42:59.9102176Z [command]/usr/bin/git log -1 --format=%H 2025-05-07T19:42:59.9124548Z a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:42:59.9278638Z ##[group]Run . $PRELUDE; print_system_info 2025-05-07T19:42:59.9279085Z . $PRELUDE; print_system_info 2025-05-07T19:42:59.9279726Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:59.9280084Z env: 2025-05-07T19:42:59.9280346Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:59.9280659Z BUILD_ENV: build_binary 2025-05-07T19:42:59.9280936Z BUILD_TARGET: genai 2025-05-07T19:42:59.9281192Z BUILD_VARIANT: cuda 2025-05-07T19:42:59.9281456Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:59.9281706Z ##[endgroup] 2025-05-07T19:43:00.4090551Z ################################################################################ 2025-05-07T19:43:00.4090937Z # Print System Info 2025-05-07T19:43:00.4091189Z # 2025-05-07T19:43:00.4108476Z # [2025-05-07T19:43:00.410Z] + print_system_info 2025-05-07T19:43:00.4108882Z ################################################################################ 2025-05-07T19:43:00.4109118Z 2025-05-07T19:43:00.4109315Z ################################################################################ 2025-05-07T19:43:00.4109661Z [INFO] Printing environment variables ... 2025-05-07T19:43:00.4110118Z + printenv 2025-05-07T19:43:00.4110311Z 2025-05-07T19:43:00.4125601Z GITHUB_WORKSPACE=/__w/FBGEMM/FBGEMM 2025-05-07T19:43:00.4126011Z BUILD_VARIANT=cuda 2025-05-07T19:43:00.4126302Z HOSTNAME=2aa0e203fee3 2025-05-07T19:43:00.4126762Z GITHUB_PATH=/__w/_temp/_runner_file_commands/add_path_229091fc-9de9-4adc-b6e1-6a9573d6edfe 2025-05-07T19:43:00.4127318Z GITHUB_ACTION=__run_2 2025-05-07T19:43:00.4127575Z GITHUB_RUN_NUMBER=10601 2025-05-07T19:43:00.4127862Z RUNNER_NAME=i-0405906171cd7041e 2025-05-07T19:43:00.4128161Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-05-07T19:43:00.4128558Z PLATFORM_NAME_LC=linux-x86_64 2025-05-07T19:43:00.4128848Z MACHINE_NAME_LC=x86_64 2025-05-07T19:43:00.4129141Z GITHUB_TRIGGERING_ACTOR=q10 2025-05-07T19:43:00.4129482Z PRELUDE=.github/scripts/setup_env.bash 2025-05-07T19:43:00.4129812Z GITHUB_REF_TYPE=branch 2025-05-07T19:43:00.4130451Z *** 2025-05-07T19:43:00.4130675Z GITHUB_REPOSITORY_ID=150154628 2025-05-07T19:43:00.4130988Z GITHUB_ACTIONS=true 2025-05-07T19:43:00.4131279Z GITHUB_SHA=a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:43:00.4131890Z GITHUB_WORKFLOW_REF=pytorch/FBGEMM/.github/workflows/fbgemm_gpu_ci_cuda.yml@refs/pull/4066/merge 2025-05-07T19:43:00.4132511Z RUNNER_ENVIRONMENT=self-hosted 2025-05-07T19:43:00.4132798Z GITHUB_REF=refs/pull/4066/merge 2025-05-07T19:43:00.4133439Z RUNNER_OS=Linux 2025-05-07T19:43:00.4133674Z GITHUB_REF_PROTECTED=false 2025-05-07T19:43:00.4133952Z HOME=/github/home 2025-05-07T19:43:00.4134207Z GITHUB_API_URL=https://api.github.com 2025-05-07T19:43:00.4134526Z RUNNER_ARCH=X64 2025-05-07T19:43:00.4134745Z RUNNER_TEMP=/__w/_temp 2025-05-07T19:43:00.4134998Z BUILD_TARGET=genai 2025-05-07T19:43:00.4135423Z GITHUB_STATE=/__w/_temp/_runner_file_commands/save_state_229091fc-9de9-4adc-b6e1-6a9573d6edfe 2025-05-07T19:43:00.4136045Z GITHUB_ENV=/__w/_temp/_runner_file_commands/set_env_229091fc-9de9-4adc-b6e1-6a9573d6edfe 2025-05-07T19:43:00.4136550Z GITHUB_EVENT_PATH=/github/workflow/event.json 2025-05-07T19:43:00.4136877Z GITHUB_EVENT_NAME=pull_request 2025-05-07T19:43:00.4137165Z GITHUB_RUN_ID=14891846252 2025-05-07T19:43:00.4137623Z GITHUB_STEP_SUMMARY=/__w/_temp/_runner_file_commands/step_summary_229091fc-9de9-4adc-b6e1-6a9573d6edfe 2025-05-07T19:43:00.4138145Z BUILD_ENV=build_binary 2025-05-07T19:43:00.4138385Z GITHUB_ACTOR=q10 2025-05-07T19:43:00.4138633Z GITHUB_RUN_ATTEMPT=1 2025-05-07T19:43:00.4138886Z KERN_NAME_LC=linux 2025-05-07T19:43:00.4139112Z BUILD_CUDA_VERSION=12.8.0 2025-05-07T19:43:00.4139433Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-05-07T19:43:00.4139779Z PLATFORM_NAME=Linux-x86_64 2025-05-07T19:43:00.4140079Z GITHUB_SERVER_URL=https://github.com 2025-05-07T19:43:00.4140359Z SHLVL=1 2025-05-07T19:43:00.4140584Z GITHUB_ACTOR_ID=255046 2025-05-07T19:43:00.4140826Z RUNNER_TOOL_CACHE=/__w/_tool 2025-05-07T19:43:00.4141338Z GITHUB_WORKFLOW_SHA=6060cd4b5f971680caecdcc657faccb5720d1c3e 2025-05-07T19:43:00.4141706Z GITHUB_REF_NAME=4066/merge 2025-05-07T19:43:00.4141970Z KERN_NAME=Linux 2025-05-07T19:43:00.4142195Z GITHUB_JOB=build_artifact 2025-05-07T19:43:00.4142481Z GITHUB_REPOSITORY=pytorch/FBGEMM 2025-05-07T19:43:00.4142785Z GITHUB_RETENTION_DAYS=90 2025-05-07T19:43:00.4143214Z RUNNER_WORKSPACE=/__w/FBGEMM 2025-05-07T19:43:00.4143522Z GITHUB_ACTION_REPOSITORY= 2025-05-07T19:43:00.4143894Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:43:00.4144317Z GITHUB_BASE_REF=main 2025-05-07T19:43:00.4144553Z CI=true 2025-05-07T19:43:00.4144793Z GITHUB_REPOSITORY_OWNER=pytorch 2025-05-07T19:43:00.4145084Z GITHUB_HEAD_REF=bm/genai-rocm-oss-6 2025-05-07T19:43:00.4145401Z GITHUB_ACTION_REF= 2025-05-07T19:43:00.4145659Z GITHUB_WORKFLOW=FBGEMM GPU/GenAI CUDA CI 2025-05-07T19:43:00.4146186Z GITHUB_OUTPUT=/__w/_temp/_runner_file_commands/set_output_229091fc-9de9-4adc-b6e1-6a9573d6edfe 2025-05-07T19:43:00.4146706Z MACHINE_NAME=x86_64 2025-05-07T19:43:00.4146950Z _=/usr/bin/printenv 2025-05-07T19:43:00.4147092Z 2025-05-07T19:43:00.4147239Z ################################################################################ 2025-05-07T19:43:00.4147575Z [INFO] Print ldd version ... 2025-05-07T19:43:00.4147877Z + ldd --version 2025-05-07T19:43:00.4148015Z 2025-05-07T19:43:00.4148128Z ldd (GNU libc) 2.34 2025-05-07T19:43:00.4148444Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:43:00.4148934Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:43:00.4149495Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:43:00.4149993Z Written by Roland McGrath and Ulrich Drepper. 2025-05-07T19:43:00.4150227Z 2025-05-07T19:43:00.4150350Z ################################################################################ 2025-05-07T19:43:00.4150707Z [INFO] Print CPU info ... 2025-05-07T19:43:00.4150980Z + nproc 2025-05-07T19:43:00.4151095Z 2025-05-07T19:43:00.4160938Z 96 2025-05-07T19:43:00.4161620Z 2025-05-07T19:43:00.4162123Z + lscpu 2025-05-07T19:43:00.4162507Z 2025-05-07T19:43:00.4436700Z Architecture: x86_64 2025-05-07T19:43:00.4437186Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:43:00.4437640Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4438069Z Byte Order: Little Endian 2025-05-07T19:43:00.4438653Z CPU(s): 96 2025-05-07T19:43:00.4438975Z On-line CPU(s) list: 0-95 2025-05-07T19:43:00.4439303Z Vendor ID: GenuineIntel 2025-05-07T19:43:00.4439727Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4440121Z CPU family: 6 2025-05-07T19:43:00.4440418Z Model: 85 2025-05-07T19:43:00.4440723Z Thread(s) per core: 2 2025-05-07T19:43:00.4441025Z Core(s) per socket: 24 2025-05-07T19:43:00.4441326Z Socket(s): 2 2025-05-07T19:43:00.4441603Z Stepping: 7 2025-05-07T19:43:00.4441917Z BogoMIPS: 6000.01 2025-05-07T19:43:00.4444458Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4446923Z Hypervisor vendor: KVM 2025-05-07T19:43:00.4447407Z Virtualization type: full 2025-05-07T19:43:00.4447771Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:43:00.4448153Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:43:00.4448544Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:43:00.4448909Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:43:00.4449256Z NUMA node(s): 2 2025-05-07T19:43:00.4449568Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:43:00.4449917Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:43:00.4450390Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:43:00.4450955Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:43:00.4451480Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:43:00.4452101Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:43:00.4452697Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:43:00.4453312Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:43:00.4454075Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:43:00.4454446Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:43:00.4454849Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:43:00.4455235Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:43:00.4455822Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:43:00.4456700Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:43:00.4457372Z Vulnerability Srbds: Not affected 2025-05-07T19:43:00.4457755Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:43:00.4458008Z 2025-05-07T19:43:00.4458098Z + cat /proc/cpuinfo 2025-05-07T19:43:00.4458250Z 2025-05-07T19:43:00.4458336Z processor : 0 2025-05-07T19:43:00.4458556Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4458805Z cpu family : 6 2025-05-07T19:43:00.4459008Z model : 85 2025-05-07T19:43:00.4459317Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4459690Z stepping : 7 2025-05-07T19:43:00.4459965Z microcode : 0x5003901 2025-05-07T19:43:00.4460241Z cpu MHz : 3481.550 2025-05-07T19:43:00.4460527Z cache size : 36608 KB 2025-05-07T19:43:00.4460777Z physical id : 0 2025-05-07T19:43:00.4460991Z siblings : 48 2025-05-07T19:43:00.4461217Z core id : 0 2025-05-07T19:43:00.4461419Z cpu cores : 24 2025-05-07T19:43:00.4461646Z apicid : 0 2025-05-07T19:43:00.4461846Z initial apicid : 0 2025-05-07T19:43:00.4462079Z fpu : yes 2025-05-07T19:43:00.4462290Z fpu_exception : yes 2025-05-07T19:43:00.4462540Z cpuid level : 13 2025-05-07T19:43:00.4462768Z wp : yes 2025-05-07T19:43:00.4465154Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4468020Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4468636Z bogomips : 6000.01 2025-05-07T19:43:00.4468860Z clflush size : 64 2025-05-07T19:43:00.4469097Z cache_alignment : 64 2025-05-07T19:43:00.4469380Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4469814Z power management: 2025-05-07T19:43:00.4469958Z 2025-05-07T19:43:00.4470042Z processor : 1 2025-05-07T19:43:00.4470271Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4470520Z cpu family : 6 2025-05-07T19:43:00.4470736Z model : 85 2025-05-07T19:43:00.4471031Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4471381Z stepping : 7 2025-05-07T19:43:00.4471609Z microcode : 0x5003901 2025-05-07T19:43:00.4471837Z cpu MHz : 3215.058 2025-05-07T19:43:00.4472073Z cache size : 36608 KB 2025-05-07T19:43:00.4472297Z physical id : 0 2025-05-07T19:43:00.4472526Z siblings : 48 2025-05-07T19:43:00.4472733Z core id : 1 2025-05-07T19:43:00.4472943Z cpu cores : 24 2025-05-07T19:43:00.4473158Z apicid : 2 2025-05-07T19:43:00.4473370Z initial apicid : 2 2025-05-07T19:43:00.4473581Z fpu : yes 2025-05-07T19:43:00.4473799Z fpu_exception : yes 2025-05-07T19:43:00.4474029Z cpuid level : 13 2025-05-07T19:43:00.4474242Z wp : yes 2025-05-07T19:43:00.4476566Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4479275Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4479867Z bogomips : 6000.01 2025-05-07T19:43:00.4480256Z clflush size : 64 2025-05-07T19:43:00.4480473Z cache_alignment : 64 2025-05-07T19:43:00.4480755Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4481085Z power management: 2025-05-07T19:43:00.4481234Z 2025-05-07T19:43:00.4481324Z processor : 2 2025-05-07T19:43:00.4481545Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4481798Z cpu family : 6 2025-05-07T19:43:00.4482027Z model : 85 2025-05-07T19:43:00.4482361Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4482745Z stepping : 7 2025-05-07T19:43:00.4482952Z microcode : 0x5003901 2025-05-07T19:43:00.4483200Z cpu MHz : 3000.006 2025-05-07T19:43:00.4483514Z cache size : 36608 KB 2025-05-07T19:43:00.4483762Z physical id : 0 2025-05-07T19:43:00.4483966Z siblings : 48 2025-05-07T19:43:00.4484187Z core id : 2 2025-05-07T19:43:00.4484383Z cpu cores : 24 2025-05-07T19:43:00.4484601Z apicid : 4 2025-05-07T19:43:00.4484795Z initial apicid : 4 2025-05-07T19:43:00.4485018Z fpu : yes 2025-05-07T19:43:00.4485226Z fpu_exception : yes 2025-05-07T19:43:00.4485443Z cpuid level : 13 2025-05-07T19:43:00.4485661Z wp : yes 2025-05-07T19:43:00.4487958Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4490738Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4491322Z bogomips : 6000.01 2025-05-07T19:43:00.4491533Z clflush size : 64 2025-05-07T19:43:00.4491754Z cache_alignment : 64 2025-05-07T19:43:00.4492021Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4492351Z power management: 2025-05-07T19:43:00.4492485Z 2025-05-07T19:43:00.4492568Z processor : 3 2025-05-07T19:43:00.4492848Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4493095Z cpu family : 6 2025-05-07T19:43:00.4493289Z model : 85 2025-05-07T19:43:00.4493567Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4493908Z stepping : 7 2025-05-07T19:43:00.4494127Z microcode : 0x5003901 2025-05-07T19:43:00.4494346Z cpu MHz : 3000.006 2025-05-07T19:43:00.4494575Z cache size : 36608 KB 2025-05-07T19:43:00.4494796Z physical id : 0 2025-05-07T19:43:00.4495024Z siblings : 48 2025-05-07T19:43:00.4495219Z core id : 3 2025-05-07T19:43:00.4495429Z cpu cores : 24 2025-05-07T19:43:00.4495627Z apicid : 6 2025-05-07T19:43:00.4495839Z initial apicid : 6 2025-05-07T19:43:00.4496060Z fpu : yes 2025-05-07T19:43:00.4496255Z fpu_exception : yes 2025-05-07T19:43:00.4496482Z cpuid level : 13 2025-05-07T19:43:00.4496685Z wp : yes 2025-05-07T19:43:00.4498943Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4501537Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4502113Z bogomips : 6000.01 2025-05-07T19:43:00.4502340Z clflush size : 64 2025-05-07T19:43:00.4502553Z cache_alignment : 64 2025-05-07T19:43:00.4502834Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4503152Z power management: 2025-05-07T19:43:00.4503296Z 2025-05-07T19:43:00.4503382Z processor : 4 2025-05-07T19:43:00.4503604Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4504006Z cpu family : 6 2025-05-07T19:43:00.4504222Z model : 85 2025-05-07T19:43:00.4504493Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4504859Z stepping : 7 2025-05-07T19:43:00.4505065Z microcode : 0x5003901 2025-05-07T19:43:00.4505305Z cpu MHz : 3201.903 2025-05-07T19:43:00.4505518Z cache size : 36608 KB 2025-05-07T19:43:00.4505752Z physical id : 0 2025-05-07T19:43:00.4505958Z siblings : 48 2025-05-07T19:43:00.4506228Z core id : 4 2025-05-07T19:43:00.4506423Z cpu cores : 24 2025-05-07T19:43:00.4506639Z apicid : 8 2025-05-07T19:43:00.4506832Z initial apicid : 8 2025-05-07T19:43:00.4507057Z fpu : yes 2025-05-07T19:43:00.4507267Z fpu_exception : yes 2025-05-07T19:43:00.4507482Z cpuid level : 13 2025-05-07T19:43:00.4507698Z wp : yes 2025-05-07T19:43:00.4510095Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4512701Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4513290Z bogomips : 6000.01 2025-05-07T19:43:00.4513501Z clflush size : 64 2025-05-07T19:43:00.4513728Z cache_alignment : 64 2025-05-07T19:43:00.4513992Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4514328Z power management: 2025-05-07T19:43:00.4514455Z 2025-05-07T19:43:00.4514537Z processor : 5 2025-05-07T19:43:00.4514756Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4514997Z cpu family : 6 2025-05-07T19:43:00.4515441Z model : 85 2025-05-07T19:43:00.4515729Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4516078Z stepping : 7 2025-05-07T19:43:00.4516299Z microcode : 0x5003901 2025-05-07T19:43:00.4516617Z cpu MHz : 3316.145 2025-05-07T19:43:00.4516851Z cache size : 36608 KB 2025-05-07T19:43:00.4517077Z physical id : 0 2025-05-07T19:43:00.4517301Z siblings : 48 2025-05-07T19:43:00.4517505Z core id : 5 2025-05-07T19:43:00.4517723Z cpu cores : 24 2025-05-07T19:43:00.4517927Z apicid : 10 2025-05-07T19:43:00.4518143Z initial apicid : 10 2025-05-07T19:43:00.4518365Z fpu : yes 2025-05-07T19:43:00.4518560Z fpu_exception : yes 2025-05-07T19:43:00.4518787Z cpuid level : 13 2025-05-07T19:43:00.4518993Z wp : yes 2025-05-07T19:43:00.4521277Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4524027Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4524621Z bogomips : 6000.01 2025-05-07T19:43:00.4524851Z clflush size : 64 2025-05-07T19:43:00.4525065Z cache_alignment : 64 2025-05-07T19:43:00.4525358Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4525682Z power management: 2025-05-07T19:43:00.4525830Z 2025-05-07T19:43:00.4525916Z processor : 6 2025-05-07T19:43:00.4526141Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4526378Z cpu family : 6 2025-05-07T19:43:00.4526591Z model : 85 2025-05-07T19:43:00.4526867Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4527228Z stepping : 7 2025-05-07T19:43:00.4527432Z microcode : 0x5003901 2025-05-07T19:43:00.4527665Z cpu MHz : 3000.006 2025-05-07T19:43:00.4527876Z cache size : 36608 KB 2025-05-07T19:43:00.4528110Z physical id : 0 2025-05-07T19:43:00.4528316Z siblings : 48 2025-05-07T19:43:00.4528525Z core id : 6 2025-05-07T19:43:00.4528720Z cpu cores : 24 2025-05-07T19:43:00.4528991Z apicid : 12 2025-05-07T19:43:00.4529205Z initial apicid : 12 2025-05-07T19:43:00.4529415Z fpu : yes 2025-05-07T19:43:00.4529626Z fpu_exception : yes 2025-05-07T19:43:00.4529841Z cpuid level : 13 2025-05-07T19:43:00.4530059Z wp : yes 2025-05-07T19:43:00.4532356Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4535143Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4535732Z bogomips : 6000.01 2025-05-07T19:43:00.4535944Z clflush size : 64 2025-05-07T19:43:00.4536167Z cache_alignment : 64 2025-05-07T19:43:00.4536428Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4536754Z power management: 2025-05-07T19:43:00.4536882Z 2025-05-07T19:43:00.4536978Z processor : 7 2025-05-07T19:43:00.4537183Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4537431Z cpu family : 6 2025-05-07T19:43:00.4537635Z model : 85 2025-05-07T19:43:00.4537916Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4538321Z stepping : 7 2025-05-07T19:43:00.4538538Z microcode : 0x5003901 2025-05-07T19:43:00.4538763Z cpu MHz : 3000.006 2025-05-07T19:43:00.4538989Z cache size : 36608 KB 2025-05-07T19:43:00.4539210Z physical id : 0 2025-05-07T19:43:00.4539435Z siblings : 48 2025-05-07T19:43:00.4539632Z core id : 7 2025-05-07T19:43:00.4539843Z cpu cores : 24 2025-05-07T19:43:00.4540043Z apicid : 14 2025-05-07T19:43:00.4540259Z initial apicid : 14 2025-05-07T19:43:00.4540489Z fpu : yes 2025-05-07T19:43:00.4540686Z fpu_exception : yes 2025-05-07T19:43:00.4540914Z cpuid level : 13 2025-05-07T19:43:00.4541118Z wp : yes 2025-05-07T19:43:00.4543363Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4545953Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4546527Z bogomips : 6000.01 2025-05-07T19:43:00.4546757Z clflush size : 64 2025-05-07T19:43:00.4546971Z cache_alignment : 64 2025-05-07T19:43:00.4547255Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4547573Z power management: 2025-05-07T19:43:00.4547714Z 2025-05-07T19:43:00.4547797Z processor : 8 2025-05-07T19:43:00.4548015Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4548245Z cpu family : 6 2025-05-07T19:43:00.4548451Z model : 85 2025-05-07T19:43:00.4548713Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4549064Z stepping : 7 2025-05-07T19:43:00.4549265Z microcode : 0x5003901 2025-05-07T19:43:00.4549493Z cpu MHz : 3155.998 2025-05-07T19:43:00.4549701Z cache size : 36608 KB 2025-05-07T19:43:00.4549928Z physical id : 0 2025-05-07T19:43:00.4550130Z siblings : 48 2025-05-07T19:43:00.4550336Z core id : 8 2025-05-07T19:43:00.4550526Z cpu cores : 24 2025-05-07T19:43:00.4550731Z apicid : 16 2025-05-07T19:43:00.4550942Z initial apicid : 16 2025-05-07T19:43:00.4551149Z fpu : yes 2025-05-07T19:43:00.4551407Z fpu_exception : yes 2025-05-07T19:43:00.4551622Z cpuid level : 13 2025-05-07T19:43:00.4551841Z wp : yes 2025-05-07T19:43:00.4554653Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4557251Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4557835Z bogomips : 6000.01 2025-05-07T19:43:00.4558044Z clflush size : 64 2025-05-07T19:43:00.4558272Z cache_alignment : 64 2025-05-07T19:43:00.4558540Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4558869Z power management: 2025-05-07T19:43:00.4558998Z 2025-05-07T19:43:00.4559094Z processor : 9 2025-05-07T19:43:00.4559301Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4559548Z cpu family : 6 2025-05-07T19:43:00.4559748Z model : 85 2025-05-07T19:43:00.4560024Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4560367Z stepping : 7 2025-05-07T19:43:00.4560579Z microcode : 0x5003901 2025-05-07T19:43:00.4560855Z cpu MHz : 3167.984 2025-05-07T19:43:00.4561078Z cache size : 36608 KB 2025-05-07T19:43:00.4561295Z physical id : 0 2025-05-07T19:43:00.4561514Z siblings : 48 2025-05-07T19:43:00.4561708Z core id : 9 2025-05-07T19:43:00.4561916Z cpu cores : 24 2025-05-07T19:43:00.4562127Z apicid : 18 2025-05-07T19:43:00.4562391Z initial apicid : 18 2025-05-07T19:43:00.4562614Z fpu : yes 2025-05-07T19:43:00.4562980Z fpu_exception : yes 2025-05-07T19:43:00.4563217Z cpuid level : 13 2025-05-07T19:43:00.4563474Z wp : yes 2025-05-07T19:43:00.4565810Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4568636Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4569220Z bogomips : 6000.01 2025-05-07T19:43:00.4569453Z clflush size : 64 2025-05-07T19:43:00.4569665Z cache_alignment : 64 2025-05-07T19:43:00.4569955Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4570278Z power management: 2025-05-07T19:43:00.4570426Z 2025-05-07T19:43:00.4570508Z processor : 10 2025-05-07T19:43:00.4570735Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4570970Z cpu family : 6 2025-05-07T19:43:00.4571182Z model : 85 2025-05-07T19:43:00.4571453Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4571813Z stepping : 7 2025-05-07T19:43:00.4572016Z microcode : 0x5003901 2025-05-07T19:43:00.4572249Z cpu MHz : 3208.251 2025-05-07T19:43:00.4572462Z cache size : 36608 KB 2025-05-07T19:43:00.4572697Z physical id : 0 2025-05-07T19:43:00.4572903Z siblings : 48 2025-05-07T19:43:00.4573126Z core id : 10 2025-05-07T19:43:00.4573345Z cpu cores : 24 2025-05-07T19:43:00.4573548Z apicid : 20 2025-05-07T19:43:00.4573765Z initial apicid : 20 2025-05-07T19:43:00.4573975Z fpu : yes 2025-05-07T19:43:00.4574182Z fpu_exception : yes 2025-05-07T19:43:00.4574395Z cpuid level : 13 2025-05-07T19:43:00.4574713Z wp : yes 2025-05-07T19:43:00.4577008Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4579765Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4580348Z bogomips : 6000.01 2025-05-07T19:43:00.4580564Z clflush size : 64 2025-05-07T19:43:00.4580790Z cache_alignment : 64 2025-05-07T19:43:00.4581055Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4581394Z power management: 2025-05-07T19:43:00.4581525Z 2025-05-07T19:43:00.4581791Z processor : 11 2025-05-07T19:43:00.4582064Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4582322Z cpu family : 6 2025-05-07T19:43:00.4582535Z model : 85 2025-05-07T19:43:00.4582860Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4583233Z stepping : 7 2025-05-07T19:43:00.4583485Z microcode : 0x5003901 2025-05-07T19:43:00.4583732Z cpu MHz : 3000.006 2025-05-07T19:43:00.4583995Z cache size : 36608 KB 2025-05-07T19:43:00.4584328Z physical id : 0 2025-05-07T19:43:00.4584592Z siblings : 48 2025-05-07T19:43:00.4584820Z core id : 11 2025-05-07T19:43:00.4585076Z cpu cores : 24 2025-05-07T19:43:00.4585329Z apicid : 22 2025-05-07T19:43:00.4585556Z initial apicid : 22 2025-05-07T19:43:00.4585825Z fpu : yes 2025-05-07T19:43:00.4586047Z fpu_exception : yes 2025-05-07T19:43:00.4586310Z cpuid level : 13 2025-05-07T19:43:00.4586539Z wp : yes 2025-05-07T19:43:00.4588896Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4591602Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4592207Z bogomips : 6000.01 2025-05-07T19:43:00.4592462Z clflush size : 64 2025-05-07T19:43:00.4592743Z cache_alignment : 64 2025-05-07T19:43:00.4593056Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4593398Z power management: 2025-05-07T19:43:00.4593568Z 2025-05-07T19:43:00.4593663Z processor : 12 2025-05-07T19:43:00.4593925Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4594180Z cpu family : 6 2025-05-07T19:43:00.4594423Z model : 85 2025-05-07T19:43:00.4594712Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4595104Z stepping : 7 2025-05-07T19:43:00.4595314Z microcode : 0x5003901 2025-05-07T19:43:00.4595550Z cpu MHz : 3210.633 2025-05-07T19:43:00.4595763Z cache size : 36608 KB 2025-05-07T19:43:00.4596002Z physical id : 0 2025-05-07T19:43:00.4596215Z siblings : 48 2025-05-07T19:43:00.4596429Z core id : 12 2025-05-07T19:43:00.4596644Z cpu cores : 24 2025-05-07T19:43:00.4596842Z apicid : 24 2025-05-07T19:43:00.4597056Z initial apicid : 24 2025-05-07T19:43:00.4597270Z fpu : yes 2025-05-07T19:43:00.4597481Z fpu_exception : yes 2025-05-07T19:43:00.4597695Z cpuid level : 13 2025-05-07T19:43:00.4597912Z wp : yes 2025-05-07T19:43:00.4600205Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4603021Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4603627Z bogomips : 6000.01 2025-05-07T19:43:00.4603846Z clflush size : 64 2025-05-07T19:43:00.4604079Z cache_alignment : 64 2025-05-07T19:43:00.4604353Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4604697Z power management: 2025-05-07T19:43:00.4604828Z 2025-05-07T19:43:00.4604928Z processor : 13 2025-05-07T19:43:00.4605142Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4605391Z cpu family : 6 2025-05-07T19:43:00.4605593Z model : 85 2025-05-07T19:43:00.4605877Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4606223Z stepping : 7 2025-05-07T19:43:00.4606437Z microcode : 0x5003901 2025-05-07T19:43:00.4606660Z cpu MHz : 3000.006 2025-05-07T19:43:00.4606886Z cache size : 36608 KB 2025-05-07T19:43:00.4607108Z physical id : 0 2025-05-07T19:43:00.4607328Z siblings : 48 2025-05-07T19:43:00.4607694Z core id : 13 2025-05-07T19:43:00.4607974Z cpu cores : 24 2025-05-07T19:43:00.4608191Z apicid : 26 2025-05-07T19:43:00.4608391Z initial apicid : 26 2025-05-07T19:43:00.4608616Z fpu : yes 2025-05-07T19:43:00.4608811Z fpu_exception : yes 2025-05-07T19:43:00.4609045Z cpuid level : 13 2025-05-07T19:43:00.4609249Z wp : yes 2025-05-07T19:43:00.4611553Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4614238Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4614820Z bogomips : 6000.01 2025-05-07T19:43:00.4615046Z clflush size : 64 2025-05-07T19:43:00.4615258Z cache_alignment : 64 2025-05-07T19:43:00.4615538Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4615861Z power management: 2025-05-07T19:43:00.4616004Z 2025-05-07T19:43:00.4616087Z processor : 14 2025-05-07T19:43:00.4616318Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4616552Z cpu family : 6 2025-05-07T19:43:00.4616763Z model : 85 2025-05-07T19:43:00.4617033Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4617395Z stepping : 7 2025-05-07T19:43:00.4617602Z microcode : 0x5003901 2025-05-07T19:43:00.4617839Z cpu MHz : 3000.006 2025-05-07T19:43:00.4618055Z cache size : 36608 KB 2025-05-07T19:43:00.4618291Z physical id : 0 2025-05-07T19:43:00.4618498Z siblings : 48 2025-05-07T19:43:00.4618713Z core id : 14 2025-05-07T19:43:00.4618923Z cpu cores : 24 2025-05-07T19:43:00.4619124Z apicid : 28 2025-05-07T19:43:00.4619444Z initial apicid : 28 2025-05-07T19:43:00.4619648Z fpu : yes 2025-05-07T19:43:00.4619852Z fpu_exception : yes 2025-05-07T19:43:00.4620067Z cpuid level : 13 2025-05-07T19:43:00.4620387Z wp : yes 2025-05-07T19:43:00.4622508Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4625451Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4626058Z bogomips : 6000.01 2025-05-07T19:43:00.4626278Z clflush size : 64 2025-05-07T19:43:00.4626590Z cache_alignment : 64 2025-05-07T19:43:00.4626861Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4627208Z power management: 2025-05-07T19:43:00.4627342Z 2025-05-07T19:43:00.4627445Z processor : 15 2025-05-07T19:43:00.4627667Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4627925Z cpu family : 6 2025-05-07T19:43:00.4628131Z model : 85 2025-05-07T19:43:00.4628422Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4628772Z stepping : 7 2025-05-07T19:43:00.4628997Z microcode : 0x5003901 2025-05-07T19:43:00.4629223Z cpu MHz : 3154.398 2025-05-07T19:43:00.4629458Z cache size : 36608 KB 2025-05-07T19:43:00.4629681Z physical id : 0 2025-05-07T19:43:00.4629910Z siblings : 48 2025-05-07T19:43:00.4630112Z core id : 15 2025-05-07T19:43:00.4630329Z cpu cores : 24 2025-05-07T19:43:00.4630547Z apicid : 30 2025-05-07T19:43:00.4630802Z initial apicid : 30 2025-05-07T19:43:00.4631024Z fpu : yes 2025-05-07T19:43:00.4631218Z fpu_exception : yes 2025-05-07T19:43:00.4631445Z cpuid level : 13 2025-05-07T19:43:00.4631650Z wp : yes 2025-05-07T19:43:00.4633985Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4636721Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4637332Z bogomips : 6000.01 2025-05-07T19:43:00.4637557Z clflush size : 64 2025-05-07T19:43:00.4637770Z cache_alignment : 64 2025-05-07T19:43:00.4638051Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4638386Z power management: 2025-05-07T19:43:00.4638625Z 2025-05-07T19:43:00.4638708Z processor : 16 2025-05-07T19:43:00.4638932Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4639162Z cpu family : 6 2025-05-07T19:43:00.4639480Z model : 85 2025-05-07T19:43:00.4640057Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4640400Z stepping : 7 2025-05-07T19:43:00.4640755Z microcode : 0x5003901 2025-05-07T19:43:00.4641045Z cpu MHz : 3254.400 2025-05-07T19:43:00.4641250Z cache size : 36608 KB 2025-05-07T19:43:00.4641658Z physical id : 0 2025-05-07T19:43:00.4641901Z siblings : 48 2025-05-07T19:43:00.4642128Z core id : 16 2025-05-07T19:43:00.4642409Z cpu cores : 24 2025-05-07T19:43:00.4642609Z apicid : 32 2025-05-07T19:43:00.4642891Z initial apicid : 32 2025-05-07T19:43:00.4643230Z fpu : yes 2025-05-07T19:43:00.4643445Z fpu_exception : yes 2025-05-07T19:43:00.4643665Z cpuid level : 13 2025-05-07T19:43:00.4643888Z wp : yes 2025-05-07T19:43:00.4646185Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4648914Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4649516Z bogomips : 6000.01 2025-05-07T19:43:00.4649737Z clflush size : 64 2025-05-07T19:43:00.4649970Z cache_alignment : 64 2025-05-07T19:43:00.4650243Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4650580Z power management: 2025-05-07T19:43:00.4650712Z 2025-05-07T19:43:00.4650814Z processor : 17 2025-05-07T19:43:00.4651026Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4651273Z cpu family : 6 2025-05-07T19:43:00.4651471Z model : 85 2025-05-07T19:43:00.4651753Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4652108Z stepping : 7 2025-05-07T19:43:00.4652325Z microcode : 0x5003901 2025-05-07T19:43:00.4652548Z cpu MHz : 3000.006 2025-05-07T19:43:00.4652773Z cache size : 36608 KB 2025-05-07T19:43:00.4652994Z physical id : 0 2025-05-07T19:43:00.4653213Z siblings : 48 2025-05-07T19:43:00.4653423Z core id : 17 2025-05-07T19:43:00.4653620Z cpu cores : 24 2025-05-07T19:43:00.4653831Z apicid : 34 2025-05-07T19:43:00.4654030Z initial apicid : 34 2025-05-07T19:43:00.4654251Z fpu : yes 2025-05-07T19:43:00.4657649Z fpu_exception : yes 2025-05-07T19:43:00.4657929Z cpuid level : 13 2025-05-07T19:43:00.4658151Z wp : yes 2025-05-07T19:43:00.4660488Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4663176Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4663765Z bogomips : 6000.01 2025-05-07T19:43:00.4663993Z clflush size : 64 2025-05-07T19:43:00.4664212Z cache_alignment : 64 2025-05-07T19:43:00.4664496Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4664834Z power management: 2025-05-07T19:43:00.4664968Z 2025-05-07T19:43:00.4665055Z processor : 18 2025-05-07T19:43:00.4665282Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4665523Z cpu family : 6 2025-05-07T19:43:00.4665744Z model : 85 2025-05-07T19:43:00.4666015Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4666384Z stepping : 7 2025-05-07T19:43:00.4666590Z microcode : 0x5003901 2025-05-07T19:43:00.4666849Z cpu MHz : 3231.599 2025-05-07T19:43:00.4667645Z cache size : 36608 KB 2025-05-07T19:43:00.4668014Z physical id : 0 2025-05-07T19:43:00.4668301Z siblings : 48 2025-05-07T19:43:00.4668547Z core id : 18 2025-05-07T19:43:00.4668799Z cpu cores : 24 2025-05-07T19:43:00.4669030Z apicid : 36 2025-05-07T19:43:00.4669286Z initial apicid : 36 2025-05-07T19:43:00.4669527Z fpu : yes 2025-05-07T19:43:00.4669770Z fpu_exception : yes 2025-05-07T19:43:00.4670012Z cpuid level : 13 2025-05-07T19:43:00.4670269Z wp : yes 2025-05-07T19:43:00.4672584Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4675389Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4676035Z bogomips : 6000.01 2025-05-07T19:43:00.4676280Z clflush size : 64 2025-05-07T19:43:00.4676552Z cache_alignment : 64 2025-05-07T19:43:00.4676852Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4677237Z power management: 2025-05-07T19:43:00.4677386Z 2025-05-07T19:43:00.4677514Z processor : 19 2025-05-07T19:43:00.4677756Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4678059Z cpu family : 6 2025-05-07T19:43:00.4678290Z model : 85 2025-05-07T19:43:00.4678623Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4678992Z stepping : 7 2025-05-07T19:43:00.4701235Z microcode : 0x5003901 2025-05-07T19:43:00.4701498Z cpu MHz : 3000.006 2025-05-07T19:43:00.4701740Z cache size : 36608 KB 2025-05-07T19:43:00.4701972Z physical id : 0 2025-05-07T19:43:00.4702421Z siblings : 48 2025-05-07T19:43:00.4702662Z core id : 19 2025-05-07T19:43:00.4702891Z cpu cores : 24 2025-05-07T19:43:00.4703091Z apicid : 38 2025-05-07T19:43:00.4703308Z initial apicid : 38 2025-05-07T19:43:00.4703684Z fpu : yes 2025-05-07T19:43:00.4703899Z fpu_exception : yes 2025-05-07T19:43:00.4704118Z cpuid level : 13 2025-05-07T19:43:00.4704338Z wp : yes 2025-05-07T19:43:00.4706808Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4709495Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4710102Z bogomips : 6000.01 2025-05-07T19:43:00.4710335Z clflush size : 64 2025-05-07T19:43:00.4710552Z cache_alignment : 64 2025-05-07T19:43:00.4710829Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4711161Z power management: 2025-05-07T19:43:00.4711313Z 2025-05-07T19:43:00.4711401Z processor : 20 2025-05-07T19:43:00.4711619Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4711873Z cpu family : 6 2025-05-07T19:43:00.4712076Z model : 85 2025-05-07T19:43:00.4712368Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4712735Z stepping : 7 2025-05-07T19:43:00.4712942Z microcode : 0x5003901 2025-05-07T19:43:00.4713188Z cpu MHz : 3170.322 2025-05-07T19:43:00.4713401Z cache size : 36608 KB 2025-05-07T19:43:00.4713643Z physical id : 0 2025-05-07T19:43:00.4713850Z siblings : 48 2025-05-07T19:43:00.4714060Z core id : 20 2025-05-07T19:43:00.4714256Z cpu cores : 24 2025-05-07T19:43:00.4714459Z apicid : 40 2025-05-07T19:43:00.4714658Z initial apicid : 40 2025-05-07T19:43:00.4714883Z fpu : yes 2025-05-07T19:43:00.4715075Z fpu_exception : yes 2025-05-07T19:43:00.4715403Z cpuid level : 13 2025-05-07T19:43:00.4715604Z wp : yes 2025-05-07T19:43:00.4717856Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4720553Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4721137Z bogomips : 6000.01 2025-05-07T19:43:00.4721350Z clflush size : 64 2025-05-07T19:43:00.4721576Z cache_alignment : 64 2025-05-07T19:43:00.4721841Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4722162Z power management: 2025-05-07T19:43:00.4722409Z 2025-05-07T19:43:00.4722495Z processor : 21 2025-05-07T19:43:00.4722873Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4723119Z cpu family : 6 2025-05-07T19:43:00.4723415Z model : 85 2025-05-07T19:43:00.4723686Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4724044Z stepping : 7 2025-05-07T19:43:00.4724244Z microcode : 0x5003901 2025-05-07T19:43:00.4724470Z cpu MHz : 3810.769 2025-05-07T19:43:00.4724683Z cache size : 36608 KB 2025-05-07T19:43:00.4724913Z physical id : 0 2025-05-07T19:43:00.4725121Z siblings : 48 2025-05-07T19:43:00.4725331Z core id : 21 2025-05-07T19:43:00.4725520Z cpu cores : 24 2025-05-07T19:43:00.4725722Z apicid : 42 2025-05-07T19:43:00.4725923Z initial apicid : 42 2025-05-07T19:43:00.4726139Z fpu : yes 2025-05-07T19:43:00.4726336Z fpu_exception : yes 2025-05-07T19:43:00.4726546Z cpuid level : 13 2025-05-07T19:43:00.4726750Z wp : yes 2025-05-07T19:43:00.4729111Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4731789Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4732381Z bogomips : 6000.01 2025-05-07T19:43:00.4732594Z clflush size : 64 2025-05-07T19:43:00.4732888Z cache_alignment : 64 2025-05-07T19:43:00.4733156Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4733489Z power management: 2025-05-07T19:43:00.4733621Z 2025-05-07T19:43:00.4733712Z processor : 22 2025-05-07T19:43:00.4733934Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4734180Z cpu family : 6 2025-05-07T19:43:00.4734379Z model : 85 2025-05-07T19:43:00.4734656Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4735087Z stepping : 7 2025-05-07T19:43:00.4735284Z microcode : 0x5003901 2025-05-07T19:43:00.4735487Z cpu MHz : 3184.251 2025-05-07T19:43:00.4735691Z cache size : 36608 KB 2025-05-07T19:43:00.4735910Z physical id : 0 2025-05-07T19:43:00.4736114Z siblings : 48 2025-05-07T19:43:00.4736303Z core id : 22 2025-05-07T19:43:00.4736515Z cpu cores : 24 2025-05-07T19:43:00.4736711Z apicid : 44 2025-05-07T19:43:00.4736923Z initial apicid : 44 2025-05-07T19:43:00.4737123Z fpu : yes 2025-05-07T19:43:00.4737330Z fpu_exception : yes 2025-05-07T19:43:00.4737554Z cpuid level : 13 2025-05-07T19:43:00.4737753Z wp : yes 2025-05-07T19:43:00.4739900Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4742439Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4742989Z bogomips : 6000.01 2025-05-07T19:43:00.4743209Z clflush size : 64 2025-05-07T19:43:00.4743415Z cache_alignment : 64 2025-05-07T19:43:00.4743693Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4744001Z power management: 2025-05-07T19:43:00.4744147Z 2025-05-07T19:43:00.4744231Z processor : 23 2025-05-07T19:43:00.4744442Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4744687Z cpu family : 6 2025-05-07T19:43:00.4744900Z model : 85 2025-05-07T19:43:00.4745161Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4745518Z stepping : 7 2025-05-07T19:43:00.4745716Z microcode : 0x5003901 2025-05-07T19:43:00.4745943Z cpu MHz : 3249.037 2025-05-07T19:43:00.4746148Z cache size : 36608 KB 2025-05-07T19:43:00.4746378Z physical id : 0 2025-05-07T19:43:00.4746579Z siblings : 48 2025-05-07T19:43:00.4746782Z core id : 23 2025-05-07T19:43:00.4746974Z cpu cores : 24 2025-05-07T19:43:00.4747184Z apicid : 46 2025-05-07T19:43:00.4747377Z initial apicid : 46 2025-05-07T19:43:00.4747600Z fpu : yes 2025-05-07T19:43:00.4747808Z fpu_exception : yes 2025-05-07T19:43:00.4748014Z cpuid level : 13 2025-05-07T19:43:00.4748230Z wp : yes 2025-05-07T19:43:00.4750398Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4753171Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4753768Z bogomips : 6000.01 2025-05-07T19:43:00.4754006Z clflush size : 64 2025-05-07T19:43:00.4754237Z cache_alignment : 64 2025-05-07T19:43:00.4754503Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4754843Z power management: 2025-05-07T19:43:00.4754977Z 2025-05-07T19:43:00.4755063Z processor : 24 2025-05-07T19:43:00.4755299Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4755563Z cpu family : 6 2025-05-07T19:43:00.4755773Z model : 85 2025-05-07T19:43:00.4756069Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4756432Z stepping : 7 2025-05-07T19:43:00.4756660Z microcode : 0x5003901 2025-05-07T19:43:00.4756887Z cpu MHz : 1201.731 2025-05-07T19:43:00.4757143Z cache size : 36608 KB 2025-05-07T19:43:00.4757369Z physical id : 1 2025-05-07T19:43:00.4757601Z siblings : 48 2025-05-07T19:43:00.4757800Z core id : 0 2025-05-07T19:43:00.4758019Z cpu cores : 24 2025-05-07T19:43:00.4758220Z apicid : 64 2025-05-07T19:43:00.4758440Z initial apicid : 64 2025-05-07T19:43:00.4758653Z fpu : yes 2025-05-07T19:43:00.4758866Z fpu_exception : yes 2025-05-07T19:43:00.4759103Z cpuid level : 13 2025-05-07T19:43:00.4759310Z wp : yes 2025-05-07T19:43:00.4761569Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4764470Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4765122Z bogomips : 6000.01 2025-05-07T19:43:00.4765362Z clflush size : 64 2025-05-07T19:43:00.4765582Z cache_alignment : 64 2025-05-07T19:43:00.4765878Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4766268Z power management: 2025-05-07T19:43:00.4766414Z 2025-05-07T19:43:00.4766495Z processor : 25 2025-05-07T19:43:00.4766700Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4766941Z cpu family : 6 2025-05-07T19:43:00.4767276Z model : 85 2025-05-07T19:43:00.4767552Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4767901Z stepping : 7 2025-05-07T19:43:00.4768094Z microcode : 0x5003901 2025-05-07T19:43:00.4768316Z cpu MHz : 1203.399 2025-05-07T19:43:00.4768518Z cache size : 36608 KB 2025-05-07T19:43:00.4768734Z physical id : 1 2025-05-07T19:43:00.4768933Z siblings : 48 2025-05-07T19:43:00.4769127Z core id : 1 2025-05-07T19:43:00.4769315Z cpu cores : 24 2025-05-07T19:43:00.4769514Z apicid : 66 2025-05-07T19:43:00.4769707Z initial apicid : 66 2025-05-07T19:43:00.4769918Z fpu : yes 2025-05-07T19:43:00.4770106Z fpu_exception : yes 2025-05-07T19:43:00.4770321Z cpuid level : 13 2025-05-07T19:43:00.4770545Z wp : yes 2025-05-07T19:43:00.4772952Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4775612Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4776199Z bogomips : 6000.01 2025-05-07T19:43:00.4776403Z clflush size : 64 2025-05-07T19:43:00.4776618Z cache_alignment : 64 2025-05-07T19:43:00.4776873Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4777190Z power management: 2025-05-07T19:43:00.4777316Z 2025-05-07T19:43:00.4777393Z processor : 26 2025-05-07T19:43:00.4777598Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4777821Z cpu family : 6 2025-05-07T19:43:00.4778028Z model : 85 2025-05-07T19:43:00.4778306Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4778647Z stepping : 7 2025-05-07T19:43:00.4778848Z microcode : 0x5003901 2025-05-07T19:43:00.4779172Z cpu MHz : 3000.006 2025-05-07T19:43:00.4779479Z cache size : 36608 KB 2025-05-07T19:43:00.4779673Z physical id : 1 2025-05-07T19:43:00.4779861Z siblings : 48 2025-05-07T19:43:00.4780038Z core id : 2 2025-05-07T19:43:00.4780221Z cpu cores : 24 2025-05-07T19:43:00.4780396Z apicid : 68 2025-05-07T19:43:00.4780581Z initial apicid : 68 2025-05-07T19:43:00.4780767Z fpu : yes 2025-05-07T19:43:00.4780946Z fpu_exception : yes 2025-05-07T19:43:00.4781138Z cpuid level : 13 2025-05-07T19:43:00.4781336Z wp : yes 2025-05-07T19:43:00.4783451Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4785900Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4786490Z bogomips : 6000.01 2025-05-07T19:43:00.4786685Z clflush size : 64 2025-05-07T19:43:00.4786877Z cache_alignment : 64 2025-05-07T19:43:00.4787131Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4787421Z power management: 2025-05-07T19:43:00.4787546Z 2025-05-07T19:43:00.4787621Z processor : 27 2025-05-07T19:43:00.4787808Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4788027Z cpu family : 6 2025-05-07T19:43:00.4788204Z model : 85 2025-05-07T19:43:00.4788450Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4788769Z stepping : 7 2025-05-07T19:43:00.4788948Z microcode : 0x5003901 2025-05-07T19:43:00.4789151Z cpu MHz : 1201.882 2025-05-07T19:43:00.4789341Z cache size : 36608 KB 2025-05-07T19:43:00.4789544Z physical id : 1 2025-05-07T19:43:00.4789728Z siblings : 48 2025-05-07T19:43:00.4789906Z core id : 3 2025-05-07T19:43:00.4790081Z cpu cores : 24 2025-05-07T19:43:00.4790271Z apicid : 70 2025-05-07T19:43:00.4790449Z initial apicid : 70 2025-05-07T19:43:00.4790644Z fpu : yes 2025-05-07T19:43:00.4790818Z fpu_exception : yes 2025-05-07T19:43:00.4791016Z cpuid level : 13 2025-05-07T19:43:00.4791208Z wp : yes 2025-05-07T19:43:00.4793381Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4795861Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4796403Z bogomips : 6000.01 2025-05-07T19:43:00.4796597Z clflush size : 64 2025-05-07T19:43:00.4796797Z cache_alignment : 64 2025-05-07T19:43:00.4797038Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4797332Z power management: 2025-05-07T19:43:00.4797448Z 2025-05-07T19:43:00.4797518Z processor : 28 2025-05-07T19:43:00.4797715Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4797930Z cpu family : 6 2025-05-07T19:43:00.4798107Z model : 85 2025-05-07T19:43:00.4798353Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4798661Z stepping : 7 2025-05-07T19:43:00.4798850Z microcode : 0x5003901 2025-05-07T19:43:00.4799047Z cpu MHz : 1200.039 2025-05-07T19:43:00.4799240Z cache size : 36608 KB 2025-05-07T19:43:00.4799435Z physical id : 1 2025-05-07T19:43:00.4799626Z siblings : 48 2025-05-07T19:43:00.4799798Z core id : 4 2025-05-07T19:43:00.4799976Z cpu cores : 24 2025-05-07T19:43:00.4800152Z apicid : 72 2025-05-07T19:43:00.4800334Z initial apicid : 72 2025-05-07T19:43:00.4800524Z fpu : yes 2025-05-07T19:43:00.4800713Z fpu_exception : yes 2025-05-07T19:43:00.4800901Z cpuid level : 13 2025-05-07T19:43:00.4801095Z wp : yes 2025-05-07T19:43:00.4803495Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4806188Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4806772Z bogomips : 6000.01 2025-05-07T19:43:00.4807002Z clflush size : 64 2025-05-07T19:43:00.4807213Z cache_alignment : 64 2025-05-07T19:43:00.4807557Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4807879Z power management: 2025-05-07T19:43:00.4808020Z 2025-05-07T19:43:00.4808105Z processor : 29 2025-05-07T19:43:00.4808321Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4808562Z cpu family : 6 2025-05-07T19:43:00.4808754Z model : 85 2025-05-07T19:43:00.4809032Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4809385Z stepping : 7 2025-05-07T19:43:00.4809586Z microcode : 0x5003901 2025-05-07T19:43:00.4809814Z cpu MHz : 1203.245 2025-05-07T19:43:00.4810023Z cache size : 36608 KB 2025-05-07T19:43:00.4810252Z physical id : 1 2025-05-07T19:43:00.4810462Z siblings : 48 2025-05-07T19:43:00.4810675Z core id : 5 2025-05-07T19:43:00.4810875Z cpu cores : 24 2025-05-07T19:43:00.4811082Z apicid : 74 2025-05-07T19:43:00.4811295Z initial apicid : 74 2025-05-07T19:43:00.4811560Z fpu : yes 2025-05-07T19:43:00.4811782Z fpu_exception : yes 2025-05-07T19:43:00.4812056Z cpuid level : 13 2025-05-07T19:43:00.4812309Z wp : yes 2025-05-07T19:43:00.4814674Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4817312Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4817902Z bogomips : 6000.01 2025-05-07T19:43:00.4818121Z clflush size : 64 2025-05-07T19:43:00.4818365Z cache_alignment : 64 2025-05-07T19:43:00.4818637Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4818983Z power management: 2025-05-07T19:43:00.4819118Z 2025-05-07T19:43:00.4819206Z processor : 30 2025-05-07T19:43:00.4819443Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4819670Z cpu family : 6 2025-05-07T19:43:00.4819893Z model : 85 2025-05-07T19:43:00.4820190Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4820536Z stepping : 7 2025-05-07T19:43:00.4820770Z microcode : 0x5003901 2025-05-07T19:43:00.4821000Z cpu MHz : 1195.326 2025-05-07T19:43:00.4821246Z cache size : 36608 KB 2025-05-07T19:43:00.4821456Z physical id : 1 2025-05-07T19:43:00.4821663Z siblings : 48 2025-05-07T19:43:00.4821849Z core id : 6 2025-05-07T19:43:00.4822041Z cpu cores : 24 2025-05-07T19:43:00.4822228Z apicid : 76 2025-05-07T19:43:00.4822432Z initial apicid : 76 2025-05-07T19:43:00.4822635Z fpu : yes 2025-05-07T19:43:00.4822831Z fpu_exception : yes 2025-05-07T19:43:00.4823030Z cpuid level : 13 2025-05-07T19:43:00.4823235Z wp : yes 2025-05-07T19:43:00.4825369Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4827840Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4828381Z bogomips : 6000.01 2025-05-07T19:43:00.4828594Z clflush size : 64 2025-05-07T19:43:00.4828792Z cache_alignment : 64 2025-05-07T19:43:00.4829056Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4829407Z power management: 2025-05-07T19:43:00.4829541Z 2025-05-07T19:43:00.4829619Z processor : 31 2025-05-07T19:43:00.4829818Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4830055Z cpu family : 6 2025-05-07T19:43:00.4830244Z model : 85 2025-05-07T19:43:00.4830513Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4830848Z stepping : 7 2025-05-07T19:43:00.4831038Z microcode : 0x5003901 2025-05-07T19:43:00.4831256Z cpu MHz : 1195.824 2025-05-07T19:43:00.4831453Z cache size : 36608 KB 2025-05-07T19:43:00.4831671Z physical id : 1 2025-05-07T19:43:00.4831865Z siblings : 48 2025-05-07T19:43:00.4832062Z core id : 7 2025-05-07T19:43:00.4832244Z cpu cores : 24 2025-05-07T19:43:00.4832442Z apicid : 78 2025-05-07T19:43:00.4832628Z initial apicid : 78 2025-05-07T19:43:00.4832836Z fpu : yes 2025-05-07T19:43:00.4833017Z fpu_exception : yes 2025-05-07T19:43:00.4833228Z cpuid level : 13 2025-05-07T19:43:00.4833428Z wp : yes 2025-05-07T19:43:00.4835570Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4838008Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4838561Z bogomips : 6000.01 2025-05-07T19:43:00.4838759Z clflush size : 64 2025-05-07T19:43:00.4838971Z cache_alignment : 64 2025-05-07T19:43:00.4839220Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4839534Z power management: 2025-05-07T19:43:00.4839659Z 2025-05-07T19:43:00.4839737Z processor : 32 2025-05-07T19:43:00.4839948Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4840172Z cpu family : 6 2025-05-07T19:43:00.4840369Z model : 85 2025-05-07T19:43:00.4840635Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4840963Z stepping : 7 2025-05-07T19:43:00.4841174Z microcode : 0x5003901 2025-05-07T19:43:00.4841387Z cpu MHz : 3000.006 2025-05-07T19:43:00.4841597Z cache size : 36608 KB 2025-05-07T19:43:00.4841804Z physical id : 1 2025-05-07T19:43:00.4842008Z siblings : 48 2025-05-07T19:43:00.4842194Z core id : 8 2025-05-07T19:43:00.4842462Z cpu cores : 24 2025-05-07T19:43:00.4842649Z apicid : 80 2025-05-07T19:43:00.4843028Z initial apicid : 80 2025-05-07T19:43:00.4843240Z fpu : yes 2025-05-07T19:43:00.4843452Z fpu_exception : yes 2025-05-07T19:43:00.4843712Z cpuid level : 13 2025-05-07T19:43:00.4843918Z wp : yes 2025-05-07T19:43:00.4846218Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4848901Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4849484Z bogomips : 6000.01 2025-05-07T19:43:00.4849695Z clflush size : 64 2025-05-07T19:43:00.4849903Z cache_alignment : 64 2025-05-07T19:43:00.4850174Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4850487Z power management: 2025-05-07T19:43:00.4850629Z 2025-05-07T19:43:00.4850711Z processor : 33 2025-05-07T19:43:00.4850974Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4851212Z cpu family : 6 2025-05-07T19:43:00.4851408Z model : 85 2025-05-07T19:43:00.4851685Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4852038Z stepping : 7 2025-05-07T19:43:00.4852235Z microcode : 0x5003901 2025-05-07T19:43:00.4852460Z cpu MHz : 1196.730 2025-05-07T19:43:00.4852664Z cache size : 36608 KB 2025-05-07T19:43:00.4852891Z physical id : 1 2025-05-07T19:43:00.4853087Z siblings : 48 2025-05-07T19:43:00.4853287Z core id : 9 2025-05-07T19:43:00.4853480Z cpu cores : 24 2025-05-07T19:43:00.4853686Z apicid : 82 2025-05-07T19:43:00.4853878Z initial apicid : 82 2025-05-07T19:43:00.4854095Z fpu : yes 2025-05-07T19:43:00.4854290Z fpu_exception : yes 2025-05-07T19:43:00.4854509Z cpuid level : 13 2025-05-07T19:43:00.4854712Z wp : yes 2025-05-07T19:43:00.4856938Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4859450Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4859994Z bogomips : 6000.01 2025-05-07T19:43:00.4860192Z clflush size : 64 2025-05-07T19:43:00.4860393Z cache_alignment : 64 2025-05-07T19:43:00.4860634Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4860931Z power management: 2025-05-07T19:43:00.4861050Z 2025-05-07T19:43:00.4861122Z processor : 34 2025-05-07T19:43:00.4861320Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4861533Z cpu family : 6 2025-05-07T19:43:00.4861710Z model : 85 2025-05-07T19:43:00.4861959Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4862277Z stepping : 7 2025-05-07T19:43:00.4862464Z microcode : 0x5003901 2025-05-07T19:43:00.4862656Z cpu MHz : 1204.494 2025-05-07T19:43:00.4862844Z cache size : 36608 KB 2025-05-07T19:43:00.4863031Z physical id : 1 2025-05-07T19:43:00.4863218Z siblings : 48 2025-05-07T19:43:00.4863394Z core id : 10 2025-05-07T19:43:00.4863581Z cpu cores : 24 2025-05-07T19:43:00.4863758Z apicid : 84 2025-05-07T19:43:00.4863941Z initial apicid : 84 2025-05-07T19:43:00.4864131Z fpu : yes 2025-05-07T19:43:00.4864312Z fpu_exception : yes 2025-05-07T19:43:00.4864508Z cpuid level : 13 2025-05-07T19:43:00.4864698Z wp : yes 2025-05-07T19:43:00.4866794Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4869721Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4869810Z bogomips : 6000.01 2025-05-07T19:43:00.4869891Z clflush size : 64 2025-05-07T19:43:00.4869976Z cache_alignment : 64 2025-05-07T19:43:00.4870119Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4870201Z power management: 2025-05-07T19:43:00.4870206Z 2025-05-07T19:43:00.4870286Z processor : 35 2025-05-07T19:43:00.4870389Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4870469Z cpu family : 6 2025-05-07T19:43:00.4870630Z model : 85 2025-05-07T19:43:00.4870788Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4870878Z stepping : 7 2025-05-07T19:43:00.4870961Z microcode : 0x5003901 2025-05-07T19:43:00.4871041Z cpu MHz : 3000.006 2025-05-07T19:43:00.4871133Z cache size : 36608 KB 2025-05-07T19:43:00.4871214Z physical id : 1 2025-05-07T19:43:00.4871293Z siblings : 48 2025-05-07T19:43:00.4871372Z core id : 11 2025-05-07T19:43:00.4871460Z cpu cores : 24 2025-05-07T19:43:00.4871539Z apicid : 86 2025-05-07T19:43:00.4871623Z initial apicid : 86 2025-05-07T19:43:00.4871709Z fpu : yes 2025-05-07T19:43:00.4871791Z fpu_exception : yes 2025-05-07T19:43:00.4871872Z cpuid level : 13 2025-05-07T19:43:00.4871947Z wp : yes 2025-05-07T19:43:00.4874133Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4874526Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4874618Z bogomips : 6000.01 2025-05-07T19:43:00.4874776Z clflush size : 64 2025-05-07T19:43:00.4874861Z cache_alignment : 64 2025-05-07T19:43:00.4874988Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4875082Z power management: 2025-05-07T19:43:00.4875086Z 2025-05-07T19:43:00.4875167Z processor : 36 2025-05-07T19:43:00.4875254Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4875345Z cpu family : 6 2025-05-07T19:43:00.4875421Z model : 85 2025-05-07T19:43:00.4875585Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4875663Z stepping : 7 2025-05-07T19:43:00.4875753Z microcode : 0x5003901 2025-05-07T19:43:00.4875833Z cpu MHz : 1939.669 2025-05-07T19:43:00.4875912Z cache size : 36608 KB 2025-05-07T19:43:00.4876003Z physical id : 1 2025-05-07T19:43:00.4876079Z siblings : 48 2025-05-07T19:43:00.4876157Z core id : 12 2025-05-07T19:43:00.4876234Z cpu cores : 24 2025-05-07T19:43:00.4876323Z apicid : 88 2025-05-07T19:43:00.4876408Z initial apicid : 88 2025-05-07T19:43:00.4876486Z fpu : yes 2025-05-07T19:43:00.4876585Z fpu_exception : yes 2025-05-07T19:43:00.4876666Z cpuid level : 13 2025-05-07T19:43:00.4876744Z wp : yes 2025-05-07T19:43:00.4878917Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4879319Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4879402Z bogomips : 6000.01 2025-05-07T19:43:00.4879492Z clflush size : 64 2025-05-07T19:43:00.4879578Z cache_alignment : 64 2025-05-07T19:43:00.4879706Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4879792Z power management: 2025-05-07T19:43:00.4879796Z 2025-05-07T19:43:00.4879886Z processor : 37 2025-05-07T19:43:00.4879972Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4880049Z cpu family : 6 2025-05-07T19:43:00.4880246Z model : 85 2025-05-07T19:43:00.4880508Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4880628Z stepping : 7 2025-05-07T19:43:00.4880706Z microcode : 0x5003901 2025-05-07T19:43:00.4880788Z cpu MHz : 1203.798 2025-05-07T19:43:00.4880864Z cache size : 36608 KB 2025-05-07T19:43:00.4880938Z physical id : 1 2025-05-07T19:43:00.4881023Z siblings : 48 2025-05-07T19:43:00.4881093Z core id : 13 2025-05-07T19:43:00.4881163Z cpu cores : 24 2025-05-07T19:43:00.4881232Z apicid : 90 2025-05-07T19:43:00.4881318Z initial apicid : 90 2025-05-07T19:43:00.4881387Z fpu : yes 2025-05-07T19:43:00.4881463Z fpu_exception : yes 2025-05-07T19:43:00.4881540Z cpuid level : 13 2025-05-07T19:43:00.4881623Z wp : yes 2025-05-07T19:43:00.4883945Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4884347Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4884429Z bogomips : 6000.01 2025-05-07T19:43:00.4884509Z clflush size : 64 2025-05-07T19:43:00.4884591Z cache_alignment : 64 2025-05-07T19:43:00.4884778Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4884863Z power management: 2025-05-07T19:43:00.4884868Z 2025-05-07T19:43:00.4884947Z processor : 38 2025-05-07T19:43:00.4885042Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4885117Z cpu family : 6 2025-05-07T19:43:00.4885194Z model : 85 2025-05-07T19:43:00.4885360Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4885446Z stepping : 7 2025-05-07T19:43:00.4885528Z microcode : 0x5003901 2025-05-07T19:43:00.4885605Z cpu MHz : 3000.006 2025-05-07T19:43:00.4885697Z cache size : 36608 KB 2025-05-07T19:43:00.4885777Z physical id : 1 2025-05-07T19:43:00.4885854Z siblings : 48 2025-05-07T19:43:00.4885932Z core id : 14 2025-05-07T19:43:00.4886023Z cpu cores : 24 2025-05-07T19:43:00.4886099Z apicid : 92 2025-05-07T19:43:00.4886181Z initial apicid : 92 2025-05-07T19:43:00.4886268Z fpu : yes 2025-05-07T19:43:00.4886352Z fpu_exception : yes 2025-05-07T19:43:00.4886432Z cpuid level : 13 2025-05-07T19:43:00.4886511Z wp : yes 2025-05-07T19:43:00.4888713Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4889109Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4889213Z bogomips : 6000.01 2025-05-07T19:43:00.4889298Z clflush size : 64 2025-05-07T19:43:00.4889395Z cache_alignment : 64 2025-05-07T19:43:00.4889537Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4889643Z power management: 2025-05-07T19:43:00.4889647Z 2025-05-07T19:43:00.4889740Z processor : 39 2025-05-07T19:43:00.4889838Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4889935Z cpu family : 6 2025-05-07T19:43:00.4890026Z model : 85 2025-05-07T19:43:00.4890191Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4890287Z stepping : 7 2025-05-07T19:43:00.4890373Z microcode : 0x5003901 2025-05-07T19:43:00.4890501Z cpu MHz : 3000.006 2025-05-07T19:43:00.4890591Z cache size : 36608 KB 2025-05-07T19:43:00.4890691Z physical id : 1 2025-05-07T19:43:00.4890779Z siblings : 48 2025-05-07T19:43:00.4890868Z core id : 15 2025-05-07T19:43:00.4890958Z cpu cores : 24 2025-05-07T19:43:00.4891052Z apicid : 94 2025-05-07T19:43:00.4891141Z initial apicid : 94 2025-05-07T19:43:00.4891224Z fpu : yes 2025-05-07T19:43:00.4891327Z fpu_exception : yes 2025-05-07T19:43:00.4891413Z cpuid level : 13 2025-05-07T19:43:00.4891495Z wp : yes 2025-05-07T19:43:00.4893695Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4894086Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4894176Z bogomips : 6000.01 2025-05-07T19:43:00.4894287Z clflush size : 64 2025-05-07T19:43:00.4894377Z cache_alignment : 64 2025-05-07T19:43:00.4894512Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4894652Z power management: 2025-05-07T19:43:00.4894673Z 2025-05-07T19:43:00.4894762Z processor : 40 2025-05-07T19:43:00.4894959Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4895034Z cpu family : 6 2025-05-07T19:43:00.4895134Z model : 85 2025-05-07T19:43:00.4895282Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4895361Z stepping : 7 2025-05-07T19:43:00.4895443Z microcode : 0x5003901 2025-05-07T19:43:00.4895539Z cpu MHz : 1203.882 2025-05-07T19:43:00.4895621Z cache size : 36608 KB 2025-05-07T19:43:00.4895703Z physical id : 1 2025-05-07T19:43:00.4895811Z siblings : 48 2025-05-07T19:43:00.4895882Z core id : 16 2025-05-07T19:43:00.4895960Z cpu cores : 24 2025-05-07T19:43:00.4896031Z apicid : 96 2025-05-07T19:43:00.4896132Z initial apicid : 96 2025-05-07T19:43:00.4896208Z fpu : yes 2025-05-07T19:43:00.4896289Z fpu_exception : yes 2025-05-07T19:43:00.4896397Z cpuid level : 13 2025-05-07T19:43:00.4896476Z wp : yes 2025-05-07T19:43:00.4898489Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4898872Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4898960Z bogomips : 6000.01 2025-05-07T19:43:00.4899037Z clflush size : 64 2025-05-07T19:43:00.4899126Z cache_alignment : 64 2025-05-07T19:43:00.4899244Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4899322Z power management: 2025-05-07T19:43:00.4899326Z 2025-05-07T19:43:00.4899400Z processor : 41 2025-05-07T19:43:00.4899488Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4899560Z cpu family : 6 2025-05-07T19:43:00.4899631Z model : 85 2025-05-07T19:43:00.4899784Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4899860Z stepping : 7 2025-05-07T19:43:00.4899936Z microcode : 0x5003901 2025-05-07T19:43:00.4900008Z cpu MHz : 1202.063 2025-05-07T19:43:00.4900089Z cache size : 36608 KB 2025-05-07T19:43:00.4900213Z physical id : 1 2025-05-07T19:43:00.4900287Z siblings : 48 2025-05-07T19:43:00.4900365Z core id : 17 2025-05-07T19:43:00.4900438Z cpu cores : 24 2025-05-07T19:43:00.4900510Z apicid : 98 2025-05-07T19:43:00.4900588Z initial apicid : 98 2025-05-07T19:43:00.4900663Z fpu : yes 2025-05-07T19:43:00.4900739Z fpu_exception : yes 2025-05-07T19:43:00.4900814Z cpuid level : 13 2025-05-07T19:43:00.4900890Z wp : yes 2025-05-07T19:43:00.4902895Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4903257Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4903345Z bogomips : 6000.01 2025-05-07T19:43:00.4903418Z clflush size : 64 2025-05-07T19:43:00.4903497Z cache_alignment : 64 2025-05-07T19:43:00.4903623Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4903699Z power management: 2025-05-07T19:43:00.4903703Z 2025-05-07T19:43:00.4903778Z processor : 42 2025-05-07T19:43:00.4903905Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4903986Z cpu family : 6 2025-05-07T19:43:00.4904058Z model : 85 2025-05-07T19:43:00.4904208Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4904289Z stepping : 7 2025-05-07T19:43:00.4904363Z microcode : 0x5003901 2025-05-07T19:43:00.4904438Z cpu MHz : 3000.006 2025-05-07T19:43:00.4904513Z cache size : 36608 KB 2025-05-07T19:43:00.4904595Z physical id : 1 2025-05-07T19:43:00.4904672Z siblings : 48 2025-05-07T19:43:00.4904744Z core id : 18 2025-05-07T19:43:00.4904826Z cpu cores : 24 2025-05-07T19:43:00.4904897Z apicid : 100 2025-05-07T19:43:00.4904976Z initial apicid : 100 2025-05-07T19:43:00.4905048Z fpu : yes 2025-05-07T19:43:00.4905135Z fpu_exception : yes 2025-05-07T19:43:00.4905208Z cpuid level : 13 2025-05-07T19:43:00.4905278Z wp : yes 2025-05-07T19:43:00.4907301Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4907665Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4907740Z bogomips : 6000.01 2025-05-07T19:43:00.4907827Z clflush size : 64 2025-05-07T19:43:00.4907909Z cache_alignment : 64 2025-05-07T19:43:00.4908026Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4908115Z power management: 2025-05-07T19:43:00.4908119Z 2025-05-07T19:43:00.4908194Z processor : 43 2025-05-07T19:43:00.4908276Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4908353Z cpu family : 6 2025-05-07T19:43:00.4908435Z model : 85 2025-05-07T19:43:00.4908580Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4908652Z stepping : 7 2025-05-07T19:43:00.4908739Z microcode : 0x5003901 2025-05-07T19:43:00.4908813Z cpu MHz : 3000.006 2025-05-07T19:43:00.4908887Z cache size : 36608 KB 2025-05-07T19:43:00.4908963Z physical id : 1 2025-05-07T19:43:00.4909045Z siblings : 48 2025-05-07T19:43:00.4910965Z core id : 19 2025-05-07T19:43:00.4911038Z cpu cores : 24 2025-05-07T19:43:00.4911122Z apicid : 102 2025-05-07T19:43:00.4911200Z initial apicid : 102 2025-05-07T19:43:00.4911270Z fpu : yes 2025-05-07T19:43:00.4911345Z fpu_exception : yes 2025-05-07T19:43:00.4911427Z cpuid level : 13 2025-05-07T19:43:00.4911501Z wp : yes 2025-05-07T19:43:00.4913559Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4913932Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4914011Z bogomips : 6000.01 2025-05-07T19:43:00.4914086Z clflush size : 64 2025-05-07T19:43:00.4914174Z cache_alignment : 64 2025-05-07T19:43:00.4914297Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4914372Z power management: 2025-05-07T19:43:00.4914377Z 2025-05-07T19:43:00.4914460Z processor : 44 2025-05-07T19:43:00.4914544Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4914621Z cpu family : 6 2025-05-07T19:43:00.4914741Z model : 85 2025-05-07T19:43:00.4914901Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4914974Z stepping : 7 2025-05-07T19:43:00.4915049Z microcode : 0x5003901 2025-05-07T19:43:00.4915129Z cpu MHz : 3000.006 2025-05-07T19:43:00.4915208Z cache size : 36608 KB 2025-05-07T19:43:00.4915283Z physical id : 1 2025-05-07T19:43:00.4915353Z siblings : 48 2025-05-07T19:43:00.4915432Z core id : 20 2025-05-07T19:43:00.4915509Z cpu cores : 24 2025-05-07T19:43:00.4915583Z apicid : 104 2025-05-07T19:43:00.4915659Z initial apicid : 104 2025-05-07T19:43:00.4915742Z fpu : yes 2025-05-07T19:43:00.4915825Z fpu_exception : yes 2025-05-07T19:43:00.4915900Z cpuid level : 13 2025-05-07T19:43:00.4916037Z wp : yes 2025-05-07T19:43:00.4918064Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4918425Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4918513Z bogomips : 6000.01 2025-05-07T19:43:00.4918587Z clflush size : 64 2025-05-07T19:43:00.4918664Z cache_alignment : 64 2025-05-07T19:43:00.4918794Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4918874Z power management: 2025-05-07T19:43:00.4918878Z 2025-05-07T19:43:00.4918952Z processor : 45 2025-05-07T19:43:00.4919047Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4919119Z cpu family : 6 2025-05-07T19:43:00.4919188Z model : 85 2025-05-07T19:43:00.4919341Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4919424Z stepping : 7 2025-05-07T19:43:00.4919500Z microcode : 0x5003901 2025-05-07T19:43:00.4919573Z cpu MHz : 3000.006 2025-05-07T19:43:00.4919660Z cache size : 36608 KB 2025-05-07T19:43:00.4919733Z physical id : 1 2025-05-07T19:43:00.4919804Z siblings : 48 2025-05-07T19:43:00.4919876Z core id : 21 2025-05-07T19:43:00.4919961Z cpu cores : 24 2025-05-07T19:43:00.4920081Z apicid : 106 2025-05-07T19:43:00.4920157Z initial apicid : 106 2025-05-07T19:43:00.4920228Z fpu : yes 2025-05-07T19:43:00.4920316Z fpu_exception : yes 2025-05-07T19:43:00.4920388Z cpuid level : 13 2025-05-07T19:43:00.4920457Z wp : yes 2025-05-07T19:43:00.4922570Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4923124Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4923210Z bogomips : 6000.01 2025-05-07T19:43:00.4923307Z clflush size : 64 2025-05-07T19:43:00.4923397Z cache_alignment : 64 2025-05-07T19:43:00.4923528Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4923618Z power management: 2025-05-07T19:43:00.4923622Z 2025-05-07T19:43:00.4923704Z processor : 46 2025-05-07T19:43:00.4923794Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4923882Z cpu family : 6 2025-05-07T19:43:00.4923960Z model : 85 2025-05-07T19:43:00.4924174Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4924257Z stepping : 7 2025-05-07T19:43:00.4924357Z microcode : 0x5003901 2025-05-07T19:43:00.4924434Z cpu MHz : 3000.006 2025-05-07T19:43:00.4924514Z cache size : 36608 KB 2025-05-07T19:43:00.4924596Z physical id : 1 2025-05-07T19:43:00.4924689Z siblings : 48 2025-05-07T19:43:00.4924767Z core id : 22 2025-05-07T19:43:00.4924849Z cpu cores : 24 2025-05-07T19:43:00.4924940Z apicid : 108 2025-05-07T19:43:00.4925027Z initial apicid : 108 2025-05-07T19:43:00.4925101Z fpu : yes 2025-05-07T19:43:00.4925188Z fpu_exception : yes 2025-05-07T19:43:00.4925273Z cpuid level : 13 2025-05-07T19:43:00.4925347Z wp : yes 2025-05-07T19:43:00.4927527Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4927931Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4928017Z bogomips : 6000.01 2025-05-07T19:43:00.4928102Z clflush size : 64 2025-05-07T19:43:00.4928202Z cache_alignment : 64 2025-05-07T19:43:00.4928333Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4928421Z power management: 2025-05-07T19:43:00.4928425Z 2025-05-07T19:43:00.4928509Z processor : 47 2025-05-07T19:43:00.4928600Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4928674Z cpu family : 6 2025-05-07T19:43:00.4928747Z model : 85 2025-05-07T19:43:00.4928910Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4928986Z stepping : 7 2025-05-07T19:43:00.4929072Z microcode : 0x5003901 2025-05-07T19:43:00.4929162Z cpu MHz : 3000.006 2025-05-07T19:43:00.4929242Z cache size : 36608 KB 2025-05-07T19:43:00.4929326Z physical id : 1 2025-05-07T19:43:00.4929404Z siblings : 48 2025-05-07T19:43:00.4929491Z core id : 23 2025-05-07T19:43:00.4929568Z cpu cores : 24 2025-05-07T19:43:00.4929643Z apicid : 110 2025-05-07T19:43:00.4929740Z initial apicid : 110 2025-05-07T19:43:00.4929866Z fpu : yes 2025-05-07T19:43:00.4929948Z fpu_exception : yes 2025-05-07T19:43:00.4930025Z cpuid level : 13 2025-05-07T19:43:00.4930114Z wp : yes 2025-05-07T19:43:00.4932312Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4932706Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4932788Z bogomips : 6000.01 2025-05-07T19:43:00.4932871Z clflush size : 64 2025-05-07T19:43:00.4932956Z cache_alignment : 64 2025-05-07T19:43:00.4933097Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4933177Z power management: 2025-05-07T19:43:00.4933182Z 2025-05-07T19:43:00.4933263Z processor : 48 2025-05-07T19:43:00.4933363Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4933442Z cpu family : 6 2025-05-07T19:43:00.4933518Z model : 85 2025-05-07T19:43:00.4933674Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4933762Z stepping : 7 2025-05-07T19:43:00.4933843Z microcode : 0x5003901 2025-05-07T19:43:00.4933970Z cpu MHz : 3000.006 2025-05-07T19:43:00.4934069Z cache size : 36608 KB 2025-05-07T19:43:00.4934147Z physical id : 0 2025-05-07T19:43:00.4934226Z siblings : 48 2025-05-07T19:43:00.4934299Z core id : 0 2025-05-07T19:43:00.4934387Z cpu cores : 24 2025-05-07T19:43:00.4934469Z apicid : 1 2025-05-07T19:43:00.4934547Z initial apicid : 1 2025-05-07T19:43:00.4934634Z fpu : yes 2025-05-07T19:43:00.4934718Z fpu_exception : yes 2025-05-07T19:43:00.4934805Z cpuid level : 13 2025-05-07T19:43:00.4934987Z wp : yes 2025-05-07T19:43:00.4937009Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4937368Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4937456Z bogomips : 6000.01 2025-05-07T19:43:00.4937534Z clflush size : 64 2025-05-07T19:43:00.4937611Z cache_alignment : 64 2025-05-07T19:43:00.4937731Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4937814Z power management: 2025-05-07T19:43:00.4937818Z 2025-05-07T19:43:00.4937895Z processor : 49 2025-05-07T19:43:00.4937976Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4938059Z cpu family : 6 2025-05-07T19:43:00.4938133Z model : 85 2025-05-07T19:43:00.4938282Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4938353Z stepping : 7 2025-05-07T19:43:00.4938438Z microcode : 0x5003901 2025-05-07T19:43:00.4938512Z cpu MHz : 3000.006 2025-05-07T19:43:00.4938591Z cache size : 36608 KB 2025-05-07T19:43:00.4938675Z physical id : 0 2025-05-07T19:43:00.4938744Z siblings : 48 2025-05-07T19:43:00.4938815Z core id : 1 2025-05-07T19:43:00.4938890Z cpu cores : 24 2025-05-07T19:43:00.4938968Z apicid : 3 2025-05-07T19:43:00.4939043Z initial apicid : 3 2025-05-07T19:43:00.4939112Z fpu : yes 2025-05-07T19:43:00.4939198Z fpu_exception : yes 2025-05-07T19:43:00.4939273Z cpuid level : 13 2025-05-07T19:43:00.4939388Z wp : yes 2025-05-07T19:43:00.4941407Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4941762Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4941833Z bogomips : 6000.01 2025-05-07T19:43:00.4941910Z clflush size : 64 2025-05-07T19:43:00.4941986Z cache_alignment : 64 2025-05-07T19:43:00.4942106Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4942185Z power management: 2025-05-07T19:43:00.4942190Z 2025-05-07T19:43:00.4942275Z processor : 50 2025-05-07T19:43:00.4942353Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4942427Z cpu family : 6 2025-05-07T19:43:00.4942510Z model : 85 2025-05-07T19:43:00.4942655Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4942728Z stepping : 7 2025-05-07T19:43:00.4942803Z microcode : 0x5003901 2025-05-07T19:43:00.4942889Z cpu MHz : 3000.006 2025-05-07T19:43:00.4942968Z cache size : 36608 KB 2025-05-07T19:43:00.4943092Z physical id : 0 2025-05-07T19:43:00.4943172Z siblings : 48 2025-05-07T19:43:00.4943244Z core id : 2 2025-05-07T19:43:00.4943320Z cpu cores : 24 2025-05-07T19:43:00.4943390Z apicid : 5 2025-05-07T19:43:00.4943472Z initial apicid : 5 2025-05-07T19:43:00.4943542Z fpu : yes 2025-05-07T19:43:00.4943620Z fpu_exception : yes 2025-05-07T19:43:00.4943696Z cpuid level : 13 2025-05-07T19:43:00.4943773Z wp : yes 2025-05-07T19:43:00.4945791Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4946156Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4946237Z bogomips : 6000.01 2025-05-07T19:43:00.4946313Z clflush size : 64 2025-05-07T19:43:00.4946399Z cache_alignment : 64 2025-05-07T19:43:00.4946514Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4946597Z power management: 2025-05-07T19:43:00.4946601Z 2025-05-07T19:43:00.4946675Z processor : 51 2025-05-07T19:43:00.4946765Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4946839Z cpu family : 6 2025-05-07T19:43:00.4946907Z model : 85 2025-05-07T19:43:00.4947063Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4947133Z stepping : 7 2025-05-07T19:43:00.4947213Z microcode : 0x5003901 2025-05-07T19:43:00.4947290Z cpu MHz : 3286.464 2025-05-07T19:43:00.4947375Z cache size : 36608 KB 2025-05-07T19:43:00.4947448Z physical id : 0 2025-05-07T19:43:00.4947521Z siblings : 48 2025-05-07T19:43:00.4947601Z core id : 3 2025-05-07T19:43:00.4947673Z cpu cores : 24 2025-05-07T19:43:00.4947744Z apicid : 7 2025-05-07T19:43:00.4947819Z initial apicid : 7 2025-05-07T19:43:00.4947901Z fpu : yes 2025-05-07T19:43:00.4947982Z fpu_exception : yes 2025-05-07T19:43:00.4948055Z cpuid level : 13 2025-05-07T19:43:00.4948126Z wp : yes 2025-05-07T19:43:00.4950139Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4950567Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4950656Z bogomips : 6000.01 2025-05-07T19:43:00.4950733Z clflush size : 64 2025-05-07T19:43:00.4950812Z cache_alignment : 64 2025-05-07T19:43:00.4950931Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4951017Z power management: 2025-05-07T19:43:00.4951021Z 2025-05-07T19:43:00.4951100Z processor : 52 2025-05-07T19:43:00.4951184Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4951272Z cpu family : 6 2025-05-07T19:43:00.4951344Z model : 85 2025-05-07T19:43:00.4951491Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4951580Z stepping : 7 2025-05-07T19:43:00.4951660Z microcode : 0x5003901 2025-05-07T19:43:00.4951736Z cpu MHz : 3000.006 2025-05-07T19:43:00.4951811Z cache size : 36608 KB 2025-05-07T19:43:00.4951895Z physical id : 0 2025-05-07T19:43:00.4951977Z siblings : 48 2025-05-07T19:43:00.4952097Z core id : 4 2025-05-07T19:43:00.4952170Z cpu cores : 24 2025-05-07T19:43:00.4952254Z apicid : 9 2025-05-07T19:43:00.4952334Z initial apicid : 9 2025-05-07T19:43:00.4952409Z fpu : yes 2025-05-07T19:43:00.4952498Z fpu_exception : yes 2025-05-07T19:43:00.4952574Z cpuid level : 13 2025-05-07T19:43:00.4952645Z wp : yes 2025-05-07T19:43:00.4954651Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4955013Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4955090Z bogomips : 6000.01 2025-05-07T19:43:00.4955173Z clflush size : 64 2025-05-07T19:43:00.4955250Z cache_alignment : 64 2025-05-07T19:43:00.4955370Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4955475Z power management: 2025-05-07T19:43:00.4955492Z 2025-05-07T19:43:00.4955567Z processor : 53 2025-05-07T19:43:00.4955651Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4955725Z cpu family : 6 2025-05-07T19:43:00.4955808Z model : 85 2025-05-07T19:43:00.4955951Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4956024Z stepping : 7 2025-05-07T19:43:00.4956101Z microcode : 0x5003901 2025-05-07T19:43:00.4956186Z cpu MHz : 3449.845 2025-05-07T19:43:00.4956264Z cache size : 36608 KB 2025-05-07T19:43:00.4956341Z physical id : 0 2025-05-07T19:43:00.4956430Z siblings : 48 2025-05-07T19:43:00.4956502Z core id : 5 2025-05-07T19:43:00.4956576Z cpu cores : 24 2025-05-07T19:43:00.4956654Z apicid : 11 2025-05-07T19:43:00.4956748Z initial apicid : 11 2025-05-07T19:43:00.4956820Z fpu : yes 2025-05-07T19:43:00.4956898Z fpu_exception : yes 2025-05-07T19:43:00.4956990Z cpuid level : 13 2025-05-07T19:43:00.4957062Z wp : yes 2025-05-07T19:43:00.4959060Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4959493Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4959572Z bogomips : 6000.01 2025-05-07T19:43:00.4959647Z clflush size : 64 2025-05-07T19:43:00.4959736Z cache_alignment : 64 2025-05-07T19:43:00.4959854Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4959935Z power management: 2025-05-07T19:43:00.4959940Z 2025-05-07T19:43:00.4960014Z processor : 54 2025-05-07T19:43:00.4960109Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4960191Z cpu family : 6 2025-05-07T19:43:00.4960265Z model : 85 2025-05-07T19:43:00.4960424Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4960501Z stepping : 7 2025-05-07T19:43:00.4960579Z microcode : 0x5003901 2025-05-07T19:43:00.4960655Z cpu MHz : 3243.836 2025-05-07T19:43:00.4960746Z cache size : 36608 KB 2025-05-07T19:43:00.4960819Z physical id : 0 2025-05-07T19:43:00.4960894Z siblings : 48 2025-05-07T19:43:00.4960979Z core id : 6 2025-05-07T19:43:00.4961056Z cpu cores : 24 2025-05-07T19:43:00.4961126Z apicid : 13 2025-05-07T19:43:00.4961252Z initial apicid : 13 2025-05-07T19:43:00.4961339Z fpu : yes 2025-05-07T19:43:00.4961417Z fpu_exception : yes 2025-05-07T19:43:00.4961493Z cpuid level : 13 2025-05-07T19:43:00.4961578Z wp : yes 2025-05-07T19:43:00.4963896Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4964293Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4964393Z bogomips : 6000.01 2025-05-07T19:43:00.4964478Z clflush size : 64 2025-05-07T19:43:00.4964563Z cache_alignment : 64 2025-05-07T19:43:00.4964701Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4964787Z power management: 2025-05-07T19:43:00.4964791Z 2025-05-07T19:43:00.4964873Z processor : 55 2025-05-07T19:43:00.4964963Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4965056Z cpu family : 6 2025-05-07T19:43:00.4965139Z model : 85 2025-05-07T19:43:00.4965296Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4965385Z stepping : 7 2025-05-07T19:43:00.4965472Z microcode : 0x5003901 2025-05-07T19:43:00.4965553Z cpu MHz : 3137.359 2025-05-07T19:43:00.4965638Z cache size : 36608 KB 2025-05-07T19:43:00.4965728Z physical id : 0 2025-05-07T19:43:00.4965807Z siblings : 48 2025-05-07T19:43:00.4965886Z core id : 7 2025-05-07T19:43:00.4965973Z cpu cores : 24 2025-05-07T19:43:00.4966052Z apicid : 15 2025-05-07T19:43:00.4966137Z initial apicid : 15 2025-05-07T19:43:00.4966216Z fpu : yes 2025-05-07T19:43:00.4966307Z fpu_exception : yes 2025-05-07T19:43:00.4966391Z cpuid level : 13 2025-05-07T19:43:00.4966464Z wp : yes 2025-05-07T19:43:00.4969349Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4969845Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4969928Z bogomips : 6000.01 2025-05-07T19:43:00.4970029Z clflush size : 64 2025-05-07T19:43:00.4970111Z cache_alignment : 64 2025-05-07T19:43:00.4970239Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4970339Z power management: 2025-05-07T19:43:00.4970343Z 2025-05-07T19:43:00.4970425Z processor : 56 2025-05-07T19:43:00.4970512Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4970593Z cpu family : 6 2025-05-07T19:43:00.4970682Z model : 85 2025-05-07T19:43:00.4970837Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4970923Z stepping : 7 2025-05-07T19:43:00.4971019Z microcode : 0x5003901 2025-05-07T19:43:00.4971097Z cpu MHz : 3161.309 2025-05-07T19:43:00.4971182Z cache size : 36608 KB 2025-05-07T19:43:00.4971263Z physical id : 0 2025-05-07T19:43:00.4971353Z siblings : 48 2025-05-07T19:43:00.4971429Z core id : 8 2025-05-07T19:43:00.4971508Z cpu cores : 24 2025-05-07T19:43:00.4971596Z apicid : 17 2025-05-07T19:43:00.4971677Z initial apicid : 17 2025-05-07T19:43:00.4971754Z fpu : yes 2025-05-07T19:43:00.4971907Z fpu_exception : yes 2025-05-07T19:43:00.4972000Z cpuid level : 13 2025-05-07T19:43:00.4972074Z wp : yes 2025-05-07T19:43:00.4974260Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4974671Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4974752Z bogomips : 6000.01 2025-05-07T19:43:00.4974834Z clflush size : 64 2025-05-07T19:43:00.4974930Z cache_alignment : 64 2025-05-07T19:43:00.4975060Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4975143Z power management: 2025-05-07T19:43:00.4975147Z 2025-05-07T19:43:00.4975239Z processor : 57 2025-05-07T19:43:00.4975332Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4975413Z cpu family : 6 2025-05-07T19:43:00.4975490Z model : 85 2025-05-07T19:43:00.4975652Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4975735Z stepping : 7 2025-05-07T19:43:00.4975819Z microcode : 0x5003901 2025-05-07T19:43:00.4975906Z cpu MHz : 3000.006 2025-05-07T19:43:00.4975988Z cache size : 36608 KB 2025-05-07T19:43:00.4976071Z physical id : 0 2025-05-07T19:43:00.4976150Z siblings : 48 2025-05-07T19:43:00.4976241Z core id : 9 2025-05-07T19:43:00.4976320Z cpu cores : 24 2025-05-07T19:43:00.4976420Z apicid : 19 2025-05-07T19:43:00.4976502Z initial apicid : 19 2025-05-07T19:43:00.4976588Z fpu : yes 2025-05-07T19:43:00.4976671Z fpu_exception : yes 2025-05-07T19:43:00.4976756Z cpuid level : 13 2025-05-07T19:43:00.4976843Z wp : yes 2025-05-07T19:43:00.4979018Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4979457Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4979547Z bogomips : 6000.01 2025-05-07T19:43:00.4979627Z clflush size : 64 2025-05-07T19:43:00.4979712Z cache_alignment : 64 2025-05-07T19:43:00.4979856Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4979939Z power management: 2025-05-07T19:43:00.4979943Z 2025-05-07T19:43:00.4980023Z processor : 58 2025-05-07T19:43:00.4980123Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4980201Z cpu family : 6 2025-05-07T19:43:00.4980277Z model : 85 2025-05-07T19:43:00.4980616Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4980703Z stepping : 7 2025-05-07T19:43:00.4980788Z microcode : 0x5003901 2025-05-07T19:43:00.4980867Z cpu MHz : 3261.437 2025-05-07T19:43:00.4980955Z cache size : 36608 KB 2025-05-07T19:43:00.4981141Z physical id : 0 2025-05-07T19:43:00.4981213Z siblings : 48 2025-05-07T19:43:00.4981286Z core id : 10 2025-05-07T19:43:00.4981369Z cpu cores : 24 2025-05-07T19:43:00.4981442Z apicid : 21 2025-05-07T19:43:00.4981519Z initial apicid : 21 2025-05-07T19:43:00.4981591Z fpu : yes 2025-05-07T19:43:00.4981675Z fpu_exception : yes 2025-05-07T19:43:00.4981750Z cpuid level : 13 2025-05-07T19:43:00.4981822Z wp : yes 2025-05-07T19:43:00.4983908Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4984273Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4984351Z bogomips : 6000.01 2025-05-07T19:43:00.4984443Z clflush size : 64 2025-05-07T19:43:00.4984523Z cache_alignment : 64 2025-05-07T19:43:00.4984642Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4984739Z power management: 2025-05-07T19:43:00.4984743Z 2025-05-07T19:43:00.4984819Z processor : 59 2025-05-07T19:43:00.4984899Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4984988Z cpu family : 6 2025-05-07T19:43:00.4985061Z model : 85 2025-05-07T19:43:00.4985205Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4985279Z stepping : 7 2025-05-07T19:43:00.4985370Z microcode : 0x5003901 2025-05-07T19:43:00.4985445Z cpu MHz : 3226.711 2025-05-07T19:43:00.4985520Z cache size : 36608 KB 2025-05-07T19:43:00.4985596Z physical id : 0 2025-05-07T19:43:00.4985679Z siblings : 48 2025-05-07T19:43:00.4985752Z core id : 11 2025-05-07T19:43:00.4985824Z cpu cores : 24 2025-05-07T19:43:00.4985908Z apicid : 23 2025-05-07T19:43:00.4985989Z initial apicid : 23 2025-05-07T19:43:00.4986058Z fpu : yes 2025-05-07T19:43:00.4986133Z fpu_exception : yes 2025-05-07T19:43:00.4986217Z cpuid level : 13 2025-05-07T19:43:00.4986286Z wp : yes 2025-05-07T19:43:00.4988305Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4988727Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4988807Z bogomips : 6000.01 2025-05-07T19:43:00.4988884Z clflush size : 64 2025-05-07T19:43:00.4988972Z cache_alignment : 64 2025-05-07T19:43:00.4989094Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4989173Z power management: 2025-05-07T19:43:00.4989180Z 2025-05-07T19:43:00.4989263Z processor : 60 2025-05-07T19:43:00.4989346Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4989421Z cpu family : 6 2025-05-07T19:43:00.4989491Z model : 85 2025-05-07T19:43:00.4989647Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4989720Z stepping : 7 2025-05-07T19:43:00.4989795Z microcode : 0x5003901 2025-05-07T19:43:00.4989876Z cpu MHz : 3217.406 2025-05-07T19:43:00.4989957Z cache size : 36608 KB 2025-05-07T19:43:00.4990034Z physical id : 0 2025-05-07T19:43:00.4990106Z siblings : 48 2025-05-07T19:43:00.4990188Z core id : 12 2025-05-07T19:43:00.4990263Z cpu cores : 24 2025-05-07T19:43:00.4990334Z apicid : 25 2025-05-07T19:43:00.4990419Z initial apicid : 25 2025-05-07T19:43:00.4990492Z fpu : yes 2025-05-07T19:43:00.4990571Z fpu_exception : yes 2025-05-07T19:43:00.4990642Z cpuid level : 13 2025-05-07T19:43:00.4990721Z wp : yes 2025-05-07T19:43:00.4992813Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4993186Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4993260Z bogomips : 6000.01 2025-05-07T19:43:00.4993346Z clflush size : 64 2025-05-07T19:43:00.4993428Z cache_alignment : 64 2025-05-07T19:43:00.4993564Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4993646Z power management: 2025-05-07T19:43:00.4993650Z 2025-05-07T19:43:00.4993732Z processor : 61 2025-05-07T19:43:00.4993833Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4993912Z cpu family : 6 2025-05-07T19:43:00.4993989Z model : 85 2025-05-07T19:43:00.4994141Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4994234Z stepping : 7 2025-05-07T19:43:00.4994317Z microcode : 0x5003901 2025-05-07T19:43:00.4994396Z cpu MHz : 3222.784 2025-05-07T19:43:00.4994493Z cache size : 36608 KB 2025-05-07T19:43:00.4994576Z physical id : 0 2025-05-07T19:43:00.4994654Z siblings : 48 2025-05-07T19:43:00.4994731Z core id : 13 2025-05-07T19:43:00.4994829Z cpu cores : 24 2025-05-07T19:43:00.4994908Z apicid : 27 2025-05-07T19:43:00.4994989Z initial apicid : 27 2025-05-07T19:43:00.4995079Z fpu : yes 2025-05-07T19:43:00.4995162Z fpu_exception : yes 2025-05-07T19:43:00.4995243Z cpuid level : 13 2025-05-07T19:43:00.4995319Z wp : yes 2025-05-07T19:43:00.4997354Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.4997774Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.4997868Z bogomips : 6000.01 2025-05-07T19:43:00.4997949Z clflush size : 64 2025-05-07T19:43:00.4998033Z cache_alignment : 64 2025-05-07T19:43:00.4998158Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.4998256Z power management: 2025-05-07T19:43:00.4998260Z 2025-05-07T19:43:00.4998340Z processor : 62 2025-05-07T19:43:00.4998430Z vendor_id : GenuineIntel 2025-05-07T19:43:00.4998525Z cpu family : 6 2025-05-07T19:43:00.4998600Z model : 85 2025-05-07T19:43:00.4998749Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.4998829Z stepping : 7 2025-05-07T19:43:00.4998925Z microcode : 0x5003901 2025-05-07T19:43:00.4999002Z cpu MHz : 3143.801 2025-05-07T19:43:00.4999081Z cache size : 36608 KB 2025-05-07T19:43:00.4999173Z physical id : 0 2025-05-07T19:43:00.4999252Z siblings : 48 2025-05-07T19:43:00.4999327Z core id : 14 2025-05-07T19:43:00.4999405Z cpu cores : 24 2025-05-07T19:43:00.4999494Z apicid : 29 2025-05-07T19:43:00.4999576Z initial apicid : 29 2025-05-07T19:43:00.4999650Z fpu : yes 2025-05-07T19:43:00.4999744Z fpu_exception : yes 2025-05-07T19:43:00.4999821Z cpuid level : 13 2025-05-07T19:43:00.4999894Z wp : yes 2025-05-07T19:43:00.5001975Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5002422Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5002503Z bogomips : 6000.01 2025-05-07T19:43:00.5002595Z clflush size : 64 2025-05-07T19:43:00.5002842Z cache_alignment : 64 2025-05-07T19:43:00.5002974Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5003060Z power management: 2025-05-07T19:43:00.5003065Z 2025-05-07T19:43:00.5003166Z processor : 63 2025-05-07T19:43:00.5003257Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5003402Z cpu family : 6 2025-05-07T19:43:00.5003500Z model : 85 2025-05-07T19:43:00.5003661Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5003745Z stepping : 7 2025-05-07T19:43:00.5003831Z microcode : 0x5003901 2025-05-07T19:43:00.5003925Z cpu MHz : 3149.848 2025-05-07T19:43:00.5004011Z cache size : 36608 KB 2025-05-07T19:43:00.5004095Z physical id : 0 2025-05-07T19:43:00.5004189Z siblings : 48 2025-05-07T19:43:00.5004274Z core id : 15 2025-05-07T19:43:00.5004357Z cpu cores : 24 2025-05-07T19:43:00.5004437Z apicid : 31 2025-05-07T19:43:00.5004537Z initial apicid : 31 2025-05-07T19:43:00.5004615Z fpu : yes 2025-05-07T19:43:00.5004705Z fpu_exception : yes 2025-05-07T19:43:00.5004790Z cpuid level : 13 2025-05-07T19:43:00.5004883Z wp : yes 2025-05-07T19:43:00.5007064Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5007469Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5007612Z bogomips : 6000.01 2025-05-07T19:43:00.5007699Z clflush size : 64 2025-05-07T19:43:00.5007801Z cache_alignment : 64 2025-05-07T19:43:00.5007934Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5008020Z power management: 2025-05-07T19:43:00.5008024Z 2025-05-07T19:43:00.5008109Z processor : 64 2025-05-07T19:43:00.5008214Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5008296Z cpu family : 6 2025-05-07T19:43:00.5008376Z model : 85 2025-05-07T19:43:00.5008556Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5008640Z stepping : 7 2025-05-07T19:43:00.5008726Z microcode : 0x5003901 2025-05-07T19:43:00.5008809Z cpu MHz : 3208.594 2025-05-07T19:43:00.5008909Z cache size : 36608 KB 2025-05-07T19:43:00.5008995Z physical id : 0 2025-05-07T19:43:00.5009077Z siblings : 48 2025-05-07T19:43:00.5009171Z core id : 16 2025-05-07T19:43:00.5009253Z cpu cores : 24 2025-05-07T19:43:00.5009339Z apicid : 33 2025-05-07T19:43:00.5009427Z initial apicid : 33 2025-05-07T19:43:00.5009519Z fpu : yes 2025-05-07T19:43:00.5009608Z fpu_exception : yes 2025-05-07T19:43:00.5009693Z cpuid level : 13 2025-05-07T19:43:00.5009772Z wp : yes 2025-05-07T19:43:00.5012001Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5012397Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5012500Z bogomips : 6000.01 2025-05-07T19:43:00.5012586Z clflush size : 64 2025-05-07T19:43:00.5012675Z cache_alignment : 64 2025-05-07T19:43:00.5012808Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5012908Z power management: 2025-05-07T19:43:00.5012912Z 2025-05-07T19:43:00.5012997Z processor : 65 2025-05-07T19:43:00.5013089Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5013184Z cpu family : 6 2025-05-07T19:43:00.5013265Z model : 85 2025-05-07T19:43:00.5013429Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5013525Z stepping : 7 2025-05-07T19:43:00.5013610Z microcode : 0x5003901 2025-05-07T19:43:00.5013692Z cpu MHz : 3220.452 2025-05-07T19:43:00.5013778Z cache size : 36608 KB 2025-05-07T19:43:00.5013876Z physical id : 0 2025-05-07T19:43:00.5013957Z siblings : 48 2025-05-07T19:43:00.5014039Z core id : 17 2025-05-07T19:43:00.5014121Z cpu cores : 24 2025-05-07T19:43:00.5014215Z apicid : 35 2025-05-07T19:43:00.5014305Z initial apicid : 35 2025-05-07T19:43:00.5014385Z fpu : yes 2025-05-07T19:43:00.5014483Z fpu_exception : yes 2025-05-07T19:43:00.5014566Z cpuid level : 13 2025-05-07T19:43:00.5014646Z wp : yes 2025-05-07T19:43:00.5016897Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5017281Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5017625Z bogomips : 6000.01 2025-05-07T19:43:00.5017727Z clflush size : 64 2025-05-07T19:43:00.5017885Z cache_alignment : 64 2025-05-07T19:43:00.5018017Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5018105Z power management: 2025-05-07T19:43:00.5018125Z 2025-05-07T19:43:00.5018209Z processor : 66 2025-05-07T19:43:00.5018302Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5018386Z cpu family : 6 2025-05-07T19:43:00.5018485Z model : 85 2025-05-07T19:43:00.5018648Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5018736Z stepping : 7 2025-05-07T19:43:00.5018824Z microcode : 0x5003901 2025-05-07T19:43:00.5018923Z cpu MHz : 3000.006 2025-05-07T19:43:00.5019011Z cache size : 36608 KB 2025-05-07T19:43:00.5019095Z physical id : 0 2025-05-07T19:43:00.5019193Z siblings : 48 2025-05-07T19:43:00.5019273Z core id : 18 2025-05-07T19:43:00.5019357Z cpu cores : 24 2025-05-07T19:43:00.5019438Z apicid : 37 2025-05-07T19:43:00.5019538Z initial apicid : 37 2025-05-07T19:43:00.5019620Z fpu : yes 2025-05-07T19:43:00.5019707Z fpu_exception : yes 2025-05-07T19:43:00.5019804Z cpuid level : 13 2025-05-07T19:43:00.5019883Z wp : yes 2025-05-07T19:43:00.5022123Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5022529Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5022614Z bogomips : 6000.01 2025-05-07T19:43:00.5022702Z clflush size : 64 2025-05-07T19:43:00.5022802Z cache_alignment : 64 2025-05-07T19:43:00.5022934Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5023022Z power management: 2025-05-07T19:43:00.5023026Z 2025-05-07T19:43:00.5023110Z processor : 67 2025-05-07T19:43:00.5023215Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5023297Z cpu family : 6 2025-05-07T19:43:00.5023377Z model : 85 2025-05-07T19:43:00.5023553Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5023636Z stepping : 7 2025-05-07T19:43:00.5023726Z microcode : 0x5003901 2025-05-07T19:43:00.5023809Z cpu MHz : 3215.394 2025-05-07T19:43:00.5023910Z cache size : 36608 KB 2025-05-07T19:43:00.5023995Z physical id : 0 2025-05-07T19:43:00.5024077Z siblings : 48 2025-05-07T19:43:00.5024171Z core id : 19 2025-05-07T19:43:00.5024258Z cpu cores : 24 2025-05-07T19:43:00.5024339Z apicid : 39 2025-05-07T19:43:00.5024426Z initial apicid : 39 2025-05-07T19:43:00.5024527Z fpu : yes 2025-05-07T19:43:00.5024617Z fpu_exception : yes 2025-05-07T19:43:00.5024704Z cpuid level : 13 2025-05-07T19:43:00.5024800Z wp : yes 2025-05-07T19:43:00.5026990Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5027382Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5027481Z bogomips : 6000.01 2025-05-07T19:43:00.5027569Z clflush size : 64 2025-05-07T19:43:00.5027702Z cache_alignment : 64 2025-05-07T19:43:00.5027848Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5027935Z power management: 2025-05-07T19:43:00.5027940Z 2025-05-07T19:43:00.5028024Z processor : 68 2025-05-07T19:43:00.5028115Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5028212Z cpu family : 6 2025-05-07T19:43:00.5028292Z model : 85 2025-05-07T19:43:00.5028561Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5028655Z stepping : 7 2025-05-07T19:43:00.5028739Z microcode : 0x5003901 2025-05-07T19:43:00.5028824Z cpu MHz : 3000.006 2025-05-07T19:43:00.5028908Z cache size : 36608 KB 2025-05-07T19:43:00.5029004Z physical id : 0 2025-05-07T19:43:00.5029085Z siblings : 48 2025-05-07T19:43:00.5029165Z core id : 20 2025-05-07T19:43:00.5029260Z cpu cores : 24 2025-05-07T19:43:00.5029446Z apicid : 41 2025-05-07T19:43:00.5029528Z initial apicid : 41 2025-05-07T19:43:00.5029605Z fpu : yes 2025-05-07T19:43:00.5029703Z fpu_exception : yes 2025-05-07T19:43:00.5029785Z cpuid level : 13 2025-05-07T19:43:00.5029861Z wp : yes 2025-05-07T19:43:00.5032377Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5032752Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5032834Z bogomips : 6000.01 2025-05-07T19:43:00.5032932Z clflush size : 64 2025-05-07T19:43:00.5033018Z cache_alignment : 64 2025-05-07T19:43:00.5033147Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5033245Z power management: 2025-05-07T19:43:00.5033250Z 2025-05-07T19:43:00.5033331Z processor : 69 2025-05-07T19:43:00.5033418Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5033498Z cpu family : 6 2025-05-07T19:43:00.5033594Z model : 85 2025-05-07T19:43:00.5033747Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5033827Z stepping : 7 2025-05-07T19:43:00.5033928Z microcode : 0x5003901 2025-05-07T19:43:00.5034008Z cpu MHz : 3750.854 2025-05-07T19:43:00.5034093Z cache size : 36608 KB 2025-05-07T19:43:00.5034174Z physical id : 0 2025-05-07T19:43:00.5034268Z siblings : 48 2025-05-07T19:43:00.5034346Z core id : 21 2025-05-07T19:43:00.5034425Z cpu cores : 24 2025-05-07T19:43:00.5034517Z apicid : 43 2025-05-07T19:43:00.5034601Z initial apicid : 43 2025-05-07T19:43:00.5034679Z fpu : yes 2025-05-07T19:43:00.5034762Z fpu_exception : yes 2025-05-07T19:43:00.5034855Z cpuid level : 13 2025-05-07T19:43:00.5034933Z wp : yes 2025-05-07T19:43:00.5036955Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5037335Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5037415Z bogomips : 6000.01 2025-05-07T19:43:00.5037497Z clflush size : 64 2025-05-07T19:43:00.5037593Z cache_alignment : 64 2025-05-07T19:43:00.5037715Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5037844Z power management: 2025-05-07T19:43:00.5037848Z 2025-05-07T19:43:00.5037939Z processor : 70 2025-05-07T19:43:00.5038025Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5038103Z cpu family : 6 2025-05-07T19:43:00.5038177Z model : 85 2025-05-07T19:43:00.5038339Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5038417Z stepping : 7 2025-05-07T19:43:00.5038498Z microcode : 0x5003901 2025-05-07T19:43:00.5038588Z cpu MHz : 2070.460 2025-05-07T19:43:00.5038668Z cache size : 36608 KB 2025-05-07T19:43:00.5038751Z physical id : 0 2025-05-07T19:43:00.5038826Z siblings : 48 2025-05-07T19:43:00.5038912Z core id : 22 2025-05-07T19:43:00.5038988Z cpu cores : 24 2025-05-07T19:43:00.5039063Z apicid : 45 2025-05-07T19:43:00.5039144Z initial apicid : 45 2025-05-07T19:43:00.5039230Z fpu : yes 2025-05-07T19:43:00.5039311Z fpu_exception : yes 2025-05-07T19:43:00.5039388Z cpuid level : 13 2025-05-07T19:43:00.5039474Z wp : yes 2025-05-07T19:43:00.5041530Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5041898Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5041993Z bogomips : 6000.01 2025-05-07T19:43:00.5042071Z clflush size : 64 2025-05-07T19:43:00.5042152Z cache_alignment : 64 2025-05-07T19:43:00.5042355Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5042438Z power management: 2025-05-07T19:43:00.5042447Z 2025-05-07T19:43:00.5042524Z processor : 71 2025-05-07T19:43:00.5042626Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5042871Z cpu family : 6 2025-05-07T19:43:00.5042951Z model : 85 2025-05-07T19:43:00.5043114Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5043210Z stepping : 7 2025-05-07T19:43:00.5043297Z microcode : 0x5003901 2025-05-07T19:43:00.5043391Z cpu MHz : 3209.111 2025-05-07T19:43:00.5043495Z cache size : 36608 KB 2025-05-07T19:43:00.5043588Z physical id : 0 2025-05-07T19:43:00.5043677Z siblings : 48 2025-05-07T19:43:00.5043766Z core id : 23 2025-05-07T19:43:00.5043869Z cpu cores : 24 2025-05-07T19:43:00.5058608Z apicid : 47 2025-05-07T19:43:00.5058745Z initial apicid : 47 2025-05-07T19:43:00.5058819Z fpu : yes 2025-05-07T19:43:00.5058900Z fpu_exception : yes 2025-05-07T19:43:00.5059141Z cpuid level : 13 2025-05-07T19:43:00.5059224Z wp : yes 2025-05-07T19:43:00.5061388Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5061793Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5061874Z bogomips : 6000.01 2025-05-07T19:43:00.5061956Z clflush size : 64 2025-05-07T19:43:00.5062039Z cache_alignment : 64 2025-05-07T19:43:00.5062176Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5062257Z power management: 2025-05-07T19:43:00.5062263Z 2025-05-07T19:43:00.5062442Z processor : 72 2025-05-07T19:43:00.5062544Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5062624Z cpu family : 6 2025-05-07T19:43:00.5062698Z model : 85 2025-05-07T19:43:00.5062855Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5062948Z stepping : 7 2025-05-07T19:43:00.5063031Z microcode : 0x5003901 2025-05-07T19:43:00.5063110Z cpu MHz : 3000.006 2025-05-07T19:43:00.5063203Z cache size : 36608 KB 2025-05-07T19:43:00.5063283Z physical id : 1 2025-05-07T19:43:00.5063358Z siblings : 48 2025-05-07T19:43:00.5063434Z core id : 0 2025-05-07T19:43:00.5063523Z cpu cores : 24 2025-05-07T19:43:00.5063598Z apicid : 65 2025-05-07T19:43:00.5063677Z initial apicid : 65 2025-05-07T19:43:00.5063759Z fpu : yes 2025-05-07T19:43:00.5063839Z fpu_exception : yes 2025-05-07T19:43:00.5063916Z cpuid level : 13 2025-05-07T19:43:00.5063990Z wp : yes 2025-05-07T19:43:00.5066124Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5066556Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5066647Z bogomips : 6000.01 2025-05-07T19:43:00.5066725Z clflush size : 64 2025-05-07T19:43:00.5066803Z cache_alignment : 64 2025-05-07T19:43:00.5066928Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5067181Z power management: 2025-05-07T19:43:00.5067186Z 2025-05-07T19:43:00.5067265Z processor : 73 2025-05-07T19:43:00.5067353Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5067613Z cpu family : 6 2025-05-07T19:43:00.5067691Z model : 85 2025-05-07T19:43:00.5067849Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5067929Z stepping : 7 2025-05-07T19:43:00.5068115Z microcode : 0x5003901 2025-05-07T19:43:00.5068196Z cpu MHz : 3000.006 2025-05-07T19:43:00.5068279Z cache size : 36608 KB 2025-05-07T19:43:00.5068371Z physical id : 1 2025-05-07T19:43:00.5068452Z siblings : 48 2025-05-07T19:43:00.5068604Z core id : 1 2025-05-07T19:43:00.5068682Z cpu cores : 24 2025-05-07T19:43:00.5068772Z apicid : 67 2025-05-07T19:43:00.5068856Z initial apicid : 67 2025-05-07T19:43:00.5068931Z fpu : yes 2025-05-07T19:43:00.5069024Z fpu_exception : yes 2025-05-07T19:43:00.5069106Z cpuid level : 13 2025-05-07T19:43:00.5069183Z wp : yes 2025-05-07T19:43:00.5071361Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5071760Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5071843Z bogomips : 6000.01 2025-05-07T19:43:00.5071935Z clflush size : 64 2025-05-07T19:43:00.5072019Z cache_alignment : 64 2025-05-07T19:43:00.5072151Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5072236Z power management: 2025-05-07T19:43:00.5072241Z 2025-05-07T19:43:00.5072328Z processor : 74 2025-05-07T19:43:00.5072414Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5072493Z cpu family : 6 2025-05-07T19:43:00.5072680Z model : 85 2025-05-07T19:43:00.5072841Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5072919Z stepping : 7 2025-05-07T19:43:00.5073001Z microcode : 0x5003901 2025-05-07T19:43:00.5073095Z cpu MHz : 1203.270 2025-05-07T19:43:00.5073177Z cache size : 36608 KB 2025-05-07T19:43:00.5073254Z physical id : 1 2025-05-07T19:43:00.5073346Z siblings : 48 2025-05-07T19:43:00.5073422Z core id : 2 2025-05-07T19:43:00.5073501Z cpu cores : 24 2025-05-07T19:43:00.5073580Z apicid : 69 2025-05-07T19:43:00.5073677Z initial apicid : 69 2025-05-07T19:43:00.5073754Z fpu : yes 2025-05-07T19:43:00.5073839Z fpu_exception : yes 2025-05-07T19:43:00.5073929Z cpuid level : 13 2025-05-07T19:43:00.5074004Z wp : yes 2025-05-07T19:43:00.5076181Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5076582Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5076723Z bogomips : 6000.01 2025-05-07T19:43:00.5076805Z clflush size : 64 2025-05-07T19:43:00.5076900Z cache_alignment : 64 2025-05-07T19:43:00.5077027Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5077107Z power management: 2025-05-07T19:43:00.5077112Z 2025-05-07T19:43:00.5077190Z processor : 75 2025-05-07T19:43:00.5077281Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5077359Z cpu family : 6 2025-05-07T19:43:00.5077437Z model : 85 2025-05-07T19:43:00.5077604Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5077681Z stepping : 7 2025-05-07T19:43:00.5077765Z microcode : 0x5003901 2025-05-07T19:43:00.5077847Z cpu MHz : 3000.006 2025-05-07T19:43:00.5077931Z cache size : 36608 KB 2025-05-07T19:43:00.5078013Z physical id : 1 2025-05-07T19:43:00.5078088Z siblings : 48 2025-05-07T19:43:00.5078171Z core id : 3 2025-05-07T19:43:00.5078248Z cpu cores : 24 2025-05-07T19:43:00.5078322Z apicid : 71 2025-05-07T19:43:00.5078400Z initial apicid : 71 2025-05-07T19:43:00.5078480Z fpu : yes 2025-05-07T19:43:00.5078565Z fpu_exception : yes 2025-05-07T19:43:00.5078645Z cpuid level : 13 2025-05-07T19:43:00.5078720Z wp : yes 2025-05-07T19:43:00.5081071Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5081433Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5081522Z bogomips : 6000.01 2025-05-07T19:43:00.5081597Z clflush size : 64 2025-05-07T19:43:00.5081676Z cache_alignment : 64 2025-05-07T19:43:00.5081805Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5081883Z power management: 2025-05-07T19:43:00.5081887Z 2025-05-07T19:43:00.5081961Z processor : 76 2025-05-07T19:43:00.5082040Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5082124Z cpu family : 6 2025-05-07T19:43:00.5082192Z model : 85 2025-05-07T19:43:00.5082403Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5082540Z stepping : 7 2025-05-07T19:43:00.5082619Z microcode : 0x5003901 2025-05-07T19:43:00.5082855Z cpu MHz : 3000.006 2025-05-07T19:43:00.5082938Z cache size : 36608 KB 2025-05-07T19:43:00.5083029Z physical id : 1 2025-05-07T19:43:00.5083106Z siblings : 48 2025-05-07T19:43:00.5083180Z core id : 4 2025-05-07T19:43:00.5083269Z cpu cores : 24 2025-05-07T19:43:00.5083344Z apicid : 73 2025-05-07T19:43:00.5083427Z initial apicid : 73 2025-05-07T19:43:00.5083503Z fpu : yes 2025-05-07T19:43:00.5083600Z fpu_exception : yes 2025-05-07T19:43:00.5083685Z cpuid level : 13 2025-05-07T19:43:00.5083760Z wp : yes 2025-05-07T19:43:00.5085951Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5086341Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5086421Z bogomips : 6000.01 2025-05-07T19:43:00.5086512Z clflush size : 64 2025-05-07T19:43:00.5086596Z cache_alignment : 64 2025-05-07T19:43:00.5086770Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5086863Z power management: 2025-05-07T19:43:00.5086867Z 2025-05-07T19:43:00.5086946Z processor : 77 2025-05-07T19:43:00.5087034Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5087109Z cpu family : 6 2025-05-07T19:43:00.5087191Z model : 85 2025-05-07T19:43:00.5087352Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5087434Z stepping : 7 2025-05-07T19:43:00.5087525Z microcode : 0x5003901 2025-05-07T19:43:00.5087604Z cpu MHz : 3000.006 2025-05-07T19:43:00.5087685Z cache size : 36608 KB 2025-05-07T19:43:00.5087760Z physical id : 1 2025-05-07T19:43:00.5087847Z siblings : 48 2025-05-07T19:43:00.5087924Z core id : 5 2025-05-07T19:43:00.5088003Z cpu cores : 24 2025-05-07T19:43:00.5088081Z apicid : 75 2025-05-07T19:43:00.5088172Z initial apicid : 75 2025-05-07T19:43:00.5088249Z fpu : yes 2025-05-07T19:43:00.5088332Z fpu_exception : yes 2025-05-07T19:43:00.5088420Z cpuid level : 13 2025-05-07T19:43:00.5088501Z wp : yes 2025-05-07T19:43:00.5090670Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5091076Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5091159Z bogomips : 6000.01 2025-05-07T19:43:00.5091242Z clflush size : 64 2025-05-07T19:43:00.5091333Z cache_alignment : 64 2025-05-07T19:43:00.5091466Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5091551Z power management: 2025-05-07T19:43:00.5091556Z 2025-05-07T19:43:00.5091645Z processor : 78 2025-05-07T19:43:00.5091731Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5091811Z cpu family : 6 2025-05-07T19:43:00.5091889Z model : 85 2025-05-07T19:43:00.5092056Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5092135Z stepping : 7 2025-05-07T19:43:00.5092265Z microcode : 0x5003901 2025-05-07T19:43:00.5092345Z cpu MHz : 1210.455 2025-05-07T19:43:00.5092436Z cache size : 36608 KB 2025-05-07T19:43:00.5092518Z physical id : 1 2025-05-07T19:43:00.5092597Z siblings : 48 2025-05-07T19:43:00.5092686Z core id : 6 2025-05-07T19:43:00.5092763Z cpu cores : 24 2025-05-07T19:43:00.5092841Z apicid : 77 2025-05-07T19:43:00.5092924Z initial apicid : 77 2025-05-07T19:43:00.5093012Z fpu : yes 2025-05-07T19:43:00.5093097Z fpu_exception : yes 2025-05-07T19:43:00.5093179Z cpuid level : 13 2025-05-07T19:43:00.5093264Z wp : yes 2025-05-07T19:43:00.5095526Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5095892Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5095971Z bogomips : 6000.01 2025-05-07T19:43:00.5096046Z clflush size : 64 2025-05-07T19:43:00.5096127Z cache_alignment : 64 2025-05-07T19:43:00.5096255Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5096371Z power management: 2025-05-07T19:43:00.5096376Z 2025-05-07T19:43:00.5096454Z processor : 79 2025-05-07T19:43:00.5096536Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5096621Z cpu family : 6 2025-05-07T19:43:00.5096690Z model : 85 2025-05-07T19:43:00.5096836Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5096924Z stepping : 7 2025-05-07T19:43:00.5097001Z microcode : 0x5003901 2025-05-07T19:43:00.5097072Z cpu MHz : 1208.899 2025-05-07T19:43:00.5097151Z cache size : 36608 KB 2025-05-07T19:43:00.5097237Z physical id : 1 2025-05-07T19:43:00.5097308Z siblings : 48 2025-05-07T19:43:00.5097378Z core id : 7 2025-05-07T19:43:00.5097459Z cpu cores : 24 2025-05-07T19:43:00.5097531Z apicid : 79 2025-05-07T19:43:00.5097608Z initial apicid : 79 2025-05-07T19:43:00.5097681Z fpu : yes 2025-05-07T19:43:00.5097770Z fpu_exception : yes 2025-05-07T19:43:00.5097842Z cpuid level : 13 2025-05-07T19:43:00.5097910Z wp : yes 2025-05-07T19:43:00.5099938Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5100304Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5100378Z bogomips : 6000.01 2025-05-07T19:43:00.5100462Z clflush size : 64 2025-05-07T19:43:00.5100542Z cache_alignment : 64 2025-05-07T19:43:00.5100661Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5100743Z power management: 2025-05-07T19:43:00.5100747Z 2025-05-07T19:43:00.5100826Z processor : 80 2025-05-07T19:43:00.5100910Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5100983Z cpu family : 6 2025-05-07T19:43:00.5101062Z model : 85 2025-05-07T19:43:00.5101207Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5101282Z stepping : 7 2025-05-07T19:43:00.5101366Z microcode : 0x5003901 2025-05-07T19:43:00.5101441Z cpu MHz : 1203.454 2025-05-07T19:43:00.5101518Z cache size : 36608 KB 2025-05-07T19:43:00.5101635Z physical id : 1 2025-05-07T19:43:00.5101717Z siblings : 48 2025-05-07T19:43:00.5101789Z core id : 8 2025-05-07T19:43:00.5101859Z cpu cores : 24 2025-05-07T19:43:00.5101934Z apicid : 81 2025-05-07T19:43:00.5102010Z initial apicid : 81 2025-05-07T19:43:00.5102084Z fpu : yes 2025-05-07T19:43:00.5102159Z fpu_exception : yes 2025-05-07T19:43:00.5102244Z cpuid level : 13 2025-05-07T19:43:00.5102313Z wp : yes 2025-05-07T19:43:00.5104315Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5104690Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5104764Z bogomips : 6000.01 2025-05-07T19:43:00.5104839Z clflush size : 64 2025-05-07T19:43:00.5104926Z cache_alignment : 64 2025-05-07T19:43:00.5105045Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5105123Z power management: 2025-05-07T19:43:00.5105127Z 2025-05-07T19:43:00.5105208Z processor : 81 2025-05-07T19:43:00.5105327Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5105402Z cpu family : 6 2025-05-07T19:43:00.5105474Z model : 85 2025-05-07T19:43:00.5105630Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5105704Z stepping : 7 2025-05-07T19:43:00.5105782Z microcode : 0x5003901 2025-05-07T19:43:00.5105865Z cpu MHz : 3000.006 2025-05-07T19:43:00.5105940Z cache size : 36608 KB 2025-05-07T19:43:00.5106015Z physical id : 1 2025-05-07T19:43:00.5106090Z siblings : 48 2025-05-07T19:43:00.5106174Z core id : 9 2025-05-07T19:43:00.5106246Z cpu cores : 24 2025-05-07T19:43:00.5106321Z apicid : 83 2025-05-07T19:43:00.5106404Z initial apicid : 83 2025-05-07T19:43:00.5106475Z fpu : yes 2025-05-07T19:43:00.5106551Z fpu_exception : yes 2025-05-07T19:43:00.5106625Z cpuid level : 13 2025-05-07T19:43:00.5106707Z wp : yes 2025-05-07T19:43:00.5108703Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5109073Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5109146Z bogomips : 6000.01 2025-05-07T19:43:00.5109222Z clflush size : 64 2025-05-07T19:43:00.5109301Z cache_alignment : 64 2025-05-07T19:43:00.5109431Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5109508Z power management: 2025-05-07T19:43:00.5109512Z 2025-05-07T19:43:00.5109585Z processor : 82 2025-05-07T19:43:00.5109685Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5109762Z cpu family : 6 2025-05-07T19:43:00.5109830Z model : 85 2025-05-07T19:43:00.5109974Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5110059Z stepping : 7 2025-05-07T19:43:00.5110138Z microcode : 0x5003901 2025-05-07T19:43:00.5110209Z cpu MHz : 3000.006 2025-05-07T19:43:00.5110294Z cache size : 36608 KB 2025-05-07T19:43:00.5110370Z physical id : 1 2025-05-07T19:43:00.5110439Z siblings : 48 2025-05-07T19:43:00.5110567Z core id : 10 2025-05-07T19:43:00.5110648Z cpu cores : 24 2025-05-07T19:43:00.5110716Z apicid : 85 2025-05-07T19:43:00.5110792Z initial apicid : 85 2025-05-07T19:43:00.5110863Z fpu : yes 2025-05-07T19:43:00.5110948Z fpu_exception : yes 2025-05-07T19:43:00.5111023Z cpuid level : 13 2025-05-07T19:43:00.5111092Z wp : yes 2025-05-07T19:43:00.5113112Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5113467Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5113553Z bogomips : 6000.01 2025-05-07T19:43:00.5113622Z clflush size : 64 2025-05-07T19:43:00.5113696Z cache_alignment : 64 2025-05-07T19:43:00.5113814Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5113906Z power management: 2025-05-07T19:43:00.5113910Z 2025-05-07T19:43:00.5113983Z processor : 83 2025-05-07T19:43:00.5114064Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5114136Z cpu family : 6 2025-05-07T19:43:00.5114255Z model : 85 2025-05-07T19:43:00.5114399Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5114470Z stepping : 7 2025-05-07T19:43:00.5114558Z microcode : 0x5003901 2025-05-07T19:43:00.5114630Z cpu MHz : 1203.198 2025-05-07T19:43:00.5114703Z cache size : 36608 KB 2025-05-07T19:43:00.5114776Z physical id : 1 2025-05-07T19:43:00.5114857Z siblings : 48 2025-05-07T19:43:00.5114925Z core id : 11 2025-05-07T19:43:00.5114999Z cpu cores : 24 2025-05-07T19:43:00.5115075Z apicid : 87 2025-05-07T19:43:00.5115154Z initial apicid : 87 2025-05-07T19:43:00.5115223Z fpu : yes 2025-05-07T19:43:00.5115297Z fpu_exception : yes 2025-05-07T19:43:00.5115379Z cpuid level : 13 2025-05-07T19:43:00.5115449Z wp : yes 2025-05-07T19:43:00.5117457Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5117821Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5117899Z bogomips : 6000.01 2025-05-07T19:43:00.5117972Z clflush size : 64 2025-05-07T19:43:00.5118060Z cache_alignment : 64 2025-05-07T19:43:00.5118178Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5118252Z power management: 2025-05-07T19:43:00.5118256Z 2025-05-07T19:43:00.5118338Z processor : 84 2025-05-07T19:43:00.5118418Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5118546Z cpu family : 6 2025-05-07T19:43:00.5118615Z model : 85 2025-05-07T19:43:00.5118773Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5118844Z stepping : 7 2025-05-07T19:43:00.5118920Z microcode : 0x5003901 2025-05-07T19:43:00.5118996Z cpu MHz : 1202.187 2025-05-07T19:43:00.5119071Z cache size : 36608 KB 2025-05-07T19:43:00.5119146Z physical id : 1 2025-05-07T19:43:00.5119220Z siblings : 48 2025-05-07T19:43:00.5119300Z core id : 12 2025-05-07T19:43:00.5119370Z cpu cores : 24 2025-05-07T19:43:00.5119485Z apicid : 89 2025-05-07T19:43:00.5119570Z initial apicid : 89 2025-05-07T19:43:00.5119643Z fpu : yes 2025-05-07T19:43:00.5119717Z fpu_exception : yes 2025-05-07T19:43:00.5119788Z cpuid level : 13 2025-05-07T19:43:00.5119871Z wp : yes 2025-05-07T19:43:00.5121878Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5122305Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5122398Z bogomips : 6000.01 2025-05-07T19:43:00.5122469Z clflush size : 64 2025-05-07T19:43:00.5122543Z cache_alignment : 64 2025-05-07T19:43:00.5122835Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5122915Z power management: 2025-05-07T19:43:00.5122919Z 2025-05-07T19:43:00.5122995Z processor : 85 2025-05-07T19:43:00.5123091Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5123167Z cpu family : 6 2025-05-07T19:43:00.5123240Z model : 85 2025-05-07T19:43:00.5123392Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5123591Z stepping : 7 2025-05-07T19:43:00.5123671Z microcode : 0x5003901 2025-05-07T19:43:00.5123747Z cpu MHz : 3000.006 2025-05-07T19:43:00.5123839Z cache size : 36608 KB 2025-05-07T19:43:00.5123917Z physical id : 1 2025-05-07T19:43:00.5123990Z siblings : 48 2025-05-07T19:43:00.5124062Z core id : 13 2025-05-07T19:43:00.5124148Z cpu cores : 24 2025-05-07T19:43:00.5124222Z apicid : 91 2025-05-07T19:43:00.5124304Z initial apicid : 91 2025-05-07T19:43:00.5124385Z fpu : yes 2025-05-07T19:43:00.5124468Z fpu_exception : yes 2025-05-07T19:43:00.5124545Z cpuid level : 13 2025-05-07T19:43:00.5124615Z wp : yes 2025-05-07T19:43:00.5126814Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5127204Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5127290Z bogomips : 6000.01 2025-05-07T19:43:00.5127373Z clflush size : 64 2025-05-07T19:43:00.5127451Z cache_alignment : 64 2025-05-07T19:43:00.5127577Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5127667Z power management: 2025-05-07T19:43:00.5127671Z 2025-05-07T19:43:00.5127748Z processor : 86 2025-05-07T19:43:00.5127835Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5127920Z cpu family : 6 2025-05-07T19:43:00.5127996Z model : 85 2025-05-07T19:43:00.5128148Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5128224Z stepping : 7 2025-05-07T19:43:00.5128315Z microcode : 0x5003901 2025-05-07T19:43:00.5128390Z cpu MHz : 1202.638 2025-05-07T19:43:00.5128468Z cache size : 36608 KB 2025-05-07T19:43:00.5128552Z physical id : 1 2025-05-07T19:43:00.5128628Z siblings : 48 2025-05-07T19:43:00.5128700Z core id : 14 2025-05-07T19:43:00.5128776Z cpu cores : 24 2025-05-07T19:43:00.5128859Z apicid : 93 2025-05-07T19:43:00.5128939Z initial apicid : 93 2025-05-07T19:43:00.5129010Z fpu : yes 2025-05-07T19:43:00.5129173Z fpu_exception : yes 2025-05-07T19:43:00.5129260Z cpuid level : 13 2025-05-07T19:43:00.5129335Z wp : yes 2025-05-07T19:43:00.5131506Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5131905Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5131990Z bogomips : 6000.01 2025-05-07T19:43:00.5132069Z clflush size : 64 2025-05-07T19:43:00.5132167Z cache_alignment : 64 2025-05-07T19:43:00.5132293Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5132374Z power management: 2025-05-07T19:43:00.5132378Z 2025-05-07T19:43:00.5132470Z processor : 87 2025-05-07T19:43:00.5132557Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5132634Z cpu family : 6 2025-05-07T19:43:00.5132725Z model : 85 2025-05-07T19:43:00.5132883Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5132963Z stepping : 7 2025-05-07T19:43:00.5133044Z microcode : 0x5003901 2025-05-07T19:43:00.5133179Z cpu MHz : 1203.244 2025-05-07T19:43:00.5133259Z cache size : 36608 KB 2025-05-07T19:43:00.5133337Z physical id : 1 2025-05-07T19:43:00.5133414Z siblings : 48 2025-05-07T19:43:00.5133502Z core id : 15 2025-05-07T19:43:00.5133579Z cpu cores : 24 2025-05-07T19:43:00.5133654Z apicid : 95 2025-05-07T19:43:00.5133745Z initial apicid : 95 2025-05-07T19:43:00.5133817Z fpu : yes 2025-05-07T19:43:00.5133901Z fpu_exception : yes 2025-05-07T19:43:00.5133982Z cpuid level : 13 2025-05-07T19:43:00.5134066Z wp : yes 2025-05-07T19:43:00.5136260Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5136623Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5136701Z bogomips : 6000.01 2025-05-07T19:43:00.5136772Z clflush size : 64 2025-05-07T19:43:00.5136854Z cache_alignment : 64 2025-05-07T19:43:00.5136977Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5137051Z power management: 2025-05-07T19:43:00.5137055Z 2025-05-07T19:43:00.5137124Z processor : 88 2025-05-07T19:43:00.5137213Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5137281Z cpu family : 6 2025-05-07T19:43:00.5137349Z model : 85 2025-05-07T19:43:00.5137492Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5137569Z stepping : 7 2025-05-07T19:43:00.5137644Z microcode : 0x5003901 2025-05-07T19:43:00.5137715Z cpu MHz : 1203.627 2025-05-07T19:43:00.5137797Z cache size : 36608 KB 2025-05-07T19:43:00.5137871Z physical id : 1 2025-05-07T19:43:00.5137940Z siblings : 48 2025-05-07T19:43:00.5138009Z core id : 16 2025-05-07T19:43:00.5138088Z cpu cores : 24 2025-05-07T19:43:00.5138158Z apicid : 97 2025-05-07T19:43:00.5138237Z initial apicid : 97 2025-05-07T19:43:00.5138318Z fpu : yes 2025-05-07T19:43:00.5138395Z fpu_exception : yes 2025-05-07T19:43:00.5138465Z cpuid level : 13 2025-05-07T19:43:00.5138574Z wp : yes 2025-05-07T19:43:00.5140591Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5140947Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5141028Z bogomips : 6000.01 2025-05-07T19:43:00.5141105Z clflush size : 64 2025-05-07T19:43:00.5141183Z cache_alignment : 64 2025-05-07T19:43:00.5141297Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5141388Z power management: 2025-05-07T19:43:00.5141392Z 2025-05-07T19:43:00.5141464Z processor : 89 2025-05-07T19:43:00.5141541Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5141621Z cpu family : 6 2025-05-07T19:43:00.5141690Z model : 85 2025-05-07T19:43:00.5141834Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5141902Z stepping : 7 2025-05-07T19:43:00.5141981Z microcode : 0x5003901 2025-05-07T19:43:00.5142050Z cpu MHz : 3000.006 2025-05-07T19:43:00.5142121Z cache size : 36608 KB 2025-05-07T19:43:00.5142249Z physical id : 1 2025-05-07T19:43:00.5142321Z siblings : 48 2025-05-07T19:43:00.5142390Z core id : 17 2025-05-07T19:43:00.5142459Z cpu cores : 24 2025-05-07T19:43:00.5142536Z apicid : 99 2025-05-07T19:43:00.5142612Z initial apicid : 99 2025-05-07T19:43:00.5142681Z fpu : yes 2025-05-07T19:43:00.5142762Z fpu_exception : yes 2025-05-07T19:43:00.5142835Z cpuid level : 13 2025-05-07T19:43:00.5142903Z wp : yes 2025-05-07T19:43:00.5144927Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5145285Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5145359Z bogomips : 6000.01 2025-05-07T19:43:00.5145440Z clflush size : 64 2025-05-07T19:43:00.5145517Z cache_alignment : 64 2025-05-07T19:43:00.5145632Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5145713Z power management: 2025-05-07T19:43:00.5145717Z 2025-05-07T19:43:00.5145798Z processor : 90 2025-05-07T19:43:00.5145877Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5145948Z cpu family : 6 2025-05-07T19:43:00.5146023Z model : 85 2025-05-07T19:43:00.5146167Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5146237Z stepping : 7 2025-05-07T19:43:00.5146310Z microcode : 0x5003901 2025-05-07T19:43:00.5146393Z cpu MHz : 1202.929 2025-05-07T19:43:00.5146467Z cache size : 36608 KB 2025-05-07T19:43:00.5146539Z physical id : 1 2025-05-07T19:43:00.5146619Z siblings : 48 2025-05-07T19:43:00.5146689Z core id : 18 2025-05-07T19:43:00.5146758Z cpu cores : 24 2025-05-07T19:43:00.5146829Z apicid : 101 2025-05-07T19:43:00.5146914Z initial apicid : 101 2025-05-07T19:43:00.5146982Z fpu : yes 2025-05-07T19:43:00.5147056Z fpu_exception : yes 2025-05-07T19:43:00.5147133Z cpuid level : 13 2025-05-07T19:43:00.5147203Z wp : yes 2025-05-07T19:43:00.5149199Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5149608Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5149685Z bogomips : 6000.01 2025-05-07T19:43:00.5149755Z clflush size : 64 2025-05-07T19:43:00.5149833Z cache_alignment : 64 2025-05-07T19:43:00.5149950Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5150024Z power management: 2025-05-07T19:43:00.5150032Z 2025-05-07T19:43:00.5150102Z processor : 91 2025-05-07T19:43:00.5150187Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5150258Z cpu family : 6 2025-05-07T19:43:00.5150328Z model : 85 2025-05-07T19:43:00.5150476Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5150548Z stepping : 7 2025-05-07T19:43:00.5150621Z microcode : 0x5003901 2025-05-07T19:43:00.5150692Z cpu MHz : 1200.975 2025-05-07T19:43:00.5150776Z cache size : 36608 KB 2025-05-07T19:43:00.5150848Z physical id : 1 2025-05-07T19:43:00.5150917Z siblings : 48 2025-05-07T19:43:00.5151032Z core id : 19 2025-05-07T19:43:00.5151106Z cpu cores : 24 2025-05-07T19:43:00.5151175Z apicid : 103 2025-05-07T19:43:00.5151250Z initial apicid : 103 2025-05-07T19:43:00.5151330Z fpu : yes 2025-05-07T19:43:00.5151405Z fpu_exception : yes 2025-05-07T19:43:00.5151477Z cpuid level : 13 2025-05-07T19:43:00.5151546Z wp : yes 2025-05-07T19:43:00.5153547Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5153910Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5153994Z bogomips : 6000.01 2025-05-07T19:43:00.5154069Z clflush size : 64 2025-05-07T19:43:00.5154145Z cache_alignment : 64 2025-05-07T19:43:00.5154275Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5154354Z power management: 2025-05-07T19:43:00.5154358Z 2025-05-07T19:43:00.5154429Z processor : 92 2025-05-07T19:43:00.5154514Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5154593Z cpu family : 6 2025-05-07T19:43:00.5154662Z model : 85 2025-05-07T19:43:00.5154807Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5154890Z stepping : 7 2025-05-07T19:43:00.5154968Z microcode : 0x5003901 2025-05-07T19:43:00.5155039Z cpu MHz : 1203.436 2025-05-07T19:43:00.5155114Z cache size : 36608 KB 2025-05-07T19:43:00.5155198Z physical id : 1 2025-05-07T19:43:00.5155269Z siblings : 48 2025-05-07T19:43:00.5155336Z core id : 20 2025-05-07T19:43:00.5155414Z cpu cores : 24 2025-05-07T19:43:00.5155485Z apicid : 105 2025-05-07T19:43:00.5155560Z initial apicid : 105 2025-05-07T19:43:00.5155628Z fpu : yes 2025-05-07T19:43:00.5155713Z fpu_exception : yes 2025-05-07T19:43:00.5155782Z cpuid level : 13 2025-05-07T19:43:00.5155850Z wp : yes 2025-05-07T19:43:00.5157861Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5158260Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5158334Z bogomips : 6000.01 2025-05-07T19:43:00.5158416Z clflush size : 64 2025-05-07T19:43:00.5158492Z cache_alignment : 64 2025-05-07T19:43:00.5158608Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5158690Z power management: 2025-05-07T19:43:00.5158694Z 2025-05-07T19:43:00.5158764Z processor : 93 2025-05-07T19:43:00.5158842Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5158912Z cpu family : 6 2025-05-07T19:43:00.5158984Z model : 85 2025-05-07T19:43:00.5159125Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5159194Z stepping : 7 2025-05-07T19:43:00.5159271Z microcode : 0x5003901 2025-05-07T19:43:00.5159340Z cpu MHz : 1202.891 2025-05-07T19:43:00.5159412Z cache size : 36608 KB 2025-05-07T19:43:00.5159482Z physical id : 1 2025-05-07T19:43:00.5159557Z siblings : 48 2025-05-07T19:43:00.5159623Z core id : 21 2025-05-07T19:43:00.5159694Z cpu cores : 24 2025-05-07T19:43:00.5159762Z apicid : 107 2025-05-07T19:43:00.5159879Z initial apicid : 107 2025-05-07T19:43:00.5159954Z fpu : yes 2025-05-07T19:43:00.5160043Z fpu_exception : yes 2025-05-07T19:43:00.5160150Z cpuid level : 13 2025-05-07T19:43:00.5160230Z wp : yes 2025-05-07T19:43:00.5162332Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5162890Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5162979Z bogomips : 6000.01 2025-05-07T19:43:00.5163062Z clflush size : 64 2025-05-07T19:43:00.5163156Z cache_alignment : 64 2025-05-07T19:43:00.5163283Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5163366Z power management: 2025-05-07T19:43:00.5163370Z 2025-05-07T19:43:00.5163543Z processor : 94 2025-05-07T19:43:00.5163632Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5163713Z cpu family : 6 2025-05-07T19:43:00.5163794Z model : 85 2025-05-07T19:43:00.5163967Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5164046Z stepping : 7 2025-05-07T19:43:00.5164131Z microcode : 0x5003901 2025-05-07T19:43:00.5164225Z cpu MHz : 1202.133 2025-05-07T19:43:00.5164308Z cache size : 36608 KB 2025-05-07T19:43:00.5164389Z physical id : 1 2025-05-07T19:43:00.5164468Z siblings : 48 2025-05-07T19:43:00.5164560Z core id : 22 2025-05-07T19:43:00.5164640Z cpu cores : 24 2025-05-07T19:43:00.5164720Z apicid : 109 2025-05-07T19:43:00.5164807Z initial apicid : 109 2025-05-07T19:43:00.5164898Z fpu : yes 2025-05-07T19:43:00.5164985Z fpu_exception : yes 2025-05-07T19:43:00.5165069Z cpuid level : 13 2025-05-07T19:43:00.5165159Z wp : yes 2025-05-07T19:43:00.5167480Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5168409Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5168510Z bogomips : 6000.01 2025-05-07T19:43:00.5168597Z clflush size : 64 2025-05-07T19:43:00.5168687Z cache_alignment : 64 2025-05-07T19:43:00.5168833Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5168919Z power management: 2025-05-07T19:43:00.5168924Z 2025-05-07T19:43:00.5169006Z processor : 95 2025-05-07T19:43:00.5169110Z vendor_id : GenuineIntel 2025-05-07T19:43:00.5169192Z cpu family : 6 2025-05-07T19:43:00.5169274Z model : 85 2025-05-07T19:43:00.5169439Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.5169530Z stepping : 7 2025-05-07T19:43:00.5169614Z microcode : 0x5003901 2025-05-07T19:43:00.5169695Z cpu MHz : 1203.260 2025-05-07T19:43:00.5169778Z cache size : 36608 KB 2025-05-07T19:43:00.5169872Z physical id : 1 2025-05-07T19:43:00.5169950Z siblings : 48 2025-05-07T19:43:00.5170029Z core id : 23 2025-05-07T19:43:00.5170121Z cpu cores : 24 2025-05-07T19:43:00.5170202Z apicid : 111 2025-05-07T19:43:00.5170288Z initial apicid : 111 2025-05-07T19:43:00.5170366Z fpu : yes 2025-05-07T19:43:00.5170545Z fpu_exception : yes 2025-05-07T19:43:00.5170627Z cpuid level : 13 2025-05-07T19:43:00.5170707Z wp : yes 2025-05-07T19:43:00.5172866Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.5173260Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.5173344Z bogomips : 6000.01 2025-05-07T19:43:00.5173442Z clflush size : 64 2025-05-07T19:43:00.5173532Z cache_alignment : 64 2025-05-07T19:43:00.5173662Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.5173755Z power management: 2025-05-07T19:43:00.5173760Z 2025-05-07T19:43:00.5173764Z 2025-05-07T19:43:00.5173876Z ################################################################################ 2025-05-07T19:43:00.5173975Z [INFO] Print PCI info ... 2025-05-07T19:43:00.5174069Z + lspci -v 2025-05-07T19:43:00.5174076Z 2025-05-07T19:43:00.5174242Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:00.5174428Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-05-07T19:43:00.5174553Z Subsystem: Amazon.com, Inc. Device 1237 2025-05-07T19:43:00.5174672Z Flags: bus master, medium devsel, latency 0 2025-05-07T19:43:00.5174677Z 2025-05-07T19:43:00.5174879Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-05-07T19:43:00.5174980Z Physical Slot: 1 2025-05-07T19:43:00.5175098Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:00.5175103Z 2025-05-07T19:43:00.5175365Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-05-07T19:43:00.5175461Z Physical Slot: 1 2025-05-07T19:43:00.5175584Z Flags: bus master, fast devsel, latency 0, IRQ 9 2025-05-07T19:43:00.5175588Z 2025-05-07T19:43:00.5175862Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 (prog-if 00 [VGA controller]) 2025-05-07T19:43:00.5176006Z Physical Slot: 3 2025-05-07T19:43:00.5176117Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:00.5176252Z Memory at c0000000 (32-bit, prefetchable) [size=4M] 2025-05-07T19:43:00.5176379Z Expansion ROM at 000c0000 [disabled] [size=128K] 2025-05-07T19:43:00.5176396Z 2025-05-07T19:43:00.5176713Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller (prog-if 02 [NVM Express]) 2025-05-07T19:43:00.5176818Z Subsystem: Amazon.com, Inc. Device 0000 2025-05-07T19:43:00.5176900Z Physical Slot: 4 2025-05-07T19:43:00.5177039Z Flags: bus master, fast devsel, latency 0, IRQ 11 2025-05-07T19:43:00.5177198Z Memory at c0514000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:43:00.5177305Z Capabilities: 2025-05-07T19:43:00.5177406Z Kernel driver in use: nvme 2025-05-07T19:43:00.5177411Z 2025-05-07T19:43:00.5177630Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-05-07T19:43:00.5177710Z Physical Slot: 5 2025-05-07T19:43:00.5177824Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:00.5177986Z Memory at c0510000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:43:00.5178117Z Memory at c0400000 (32-bit, prefetchable) [size=1M] 2025-05-07T19:43:00.5178268Z Memory at c0500000 (32-bit, non-prefetchable) [size=64K] 2025-05-07T19:43:00.5178378Z Capabilities: 2025-05-07T19:43:00.5178471Z Kernel driver in use: ena 2025-05-07T19:43:00.5178476Z 2025-05-07T19:43:00.5178480Z 2025-05-07T19:43:00.5178663Z ################################################################################ 2025-05-07T19:43:00.5178787Z [INFO] Print Linux distribution info ... 2025-05-07T19:43:00.5178866Z + uname -a 2025-05-07T19:43:00.5178870Z 2025-05-07T19:43:00.5179268Z Linux 2aa0e203fee3 6.1.130-139.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-05-07T19:43:00.5179273Z 2025-05-07T19:43:00.5179366Z + uname -m 2025-05-07T19:43:00.5179370Z 2025-05-07T19:43:00.5179556Z x86_64 2025-05-07T19:43:00.5179561Z 2025-05-07T19:43:00.5179637Z + cat /proc/version 2025-05-07T19:43:00.5179641Z 2025-05-07T19:43:00.5180206Z Linux version 6.1.130-139.222.amzn2023.x86_64 (mockbuild@ip-10-0-55-76) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.39-6.amzn2023.0.11) #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 2025-05-07T19:43:00.5180210Z 2025-05-07T19:43:00.5180293Z + cat /etc/os-release 2025-05-07T19:43:00.5180297Z 2025-05-07T19:43:00.5180373Z NAME="Amazon Linux" 2025-05-07T19:43:00.5180462Z VERSION="2023" 2025-05-07T19:43:00.5180538Z ID="amzn" 2025-05-07T19:43:00.5180616Z ID_LIKE="fedora" 2025-05-07T19:43:00.5180688Z VERSION_ID="2023" 2025-05-07T19:43:00.5180793Z PLATFORM_ID="platform:al2023" 2025-05-07T19:43:00.5180895Z PRETTY_NAME="Amazon Linux 2023.7.20250428" 2025-05-07T19:43:00.5180968Z ANSI_COLOR="0;33" 2025-05-07T19:43:00.5181096Z CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023" 2025-05-07T19:43:00.5181267Z HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/" 2025-05-07T19:43:00.5181425Z DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/" 2025-05-07T19:43:00.5181576Z SUPPORT_URL="https://aws.amazon.com/premiumsupport/" 2025-05-07T19:43:00.5181765Z BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023" 2025-05-07T19:43:00.5181839Z VENDOR_NAME="AWS" 2025-05-07T19:43:00.5181939Z VENDOR_URL="https://aws.amazon.com/" 2025-05-07T19:43:00.5182026Z SUPPORT_END="2029-06-30" 2025-05-07T19:43:00.5182031Z 2025-05-07T19:43:00.5213536Z ##[group]Run . $PRELUDE; print_gpu_info 2025-05-07T19:43:00.5213697Z . $PRELUDE; print_gpu_info 2025-05-07T19:43:00.5213985Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:00.5214060Z env: 2025-05-07T19:43:00.5214171Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:00.5214269Z BUILD_ENV: build_binary 2025-05-07T19:43:00.5214352Z BUILD_TARGET: genai 2025-05-07T19:43:00.5214441Z BUILD_VARIANT: cuda 2025-05-07T19:43:00.5214536Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:00.5214699Z ##[endgroup] 2025-05-07T19:43:01.0300135Z ################################################################################ 2025-05-07T19:43:01.0300605Z [INFO] Printing general display info ... 2025-05-07T19:43:01.0324401Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:01.1250279Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:01.1257794Z /usr/bin/sudo 2025-05-07T19:43:01.1265999Z which: no apt-get in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.1275813Z /usr/bin/yum 2025-05-07T19:43:01.1276614Z [INSTALL] Updating system repositories ... 2025-05-07T19:43:01.1297040Z [EXEC] [ATTEMPT 0/3] + sudo yum update -y 2025-05-07T19:43:01.3432370Z Last metadata expiration check: 0:00:20 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:43:01.4388441Z Dependencies resolved. 2025-05-07T19:43:01.4601710Z Nothing to do. 2025-05-07T19:43:01.4602627Z Complete! 2025-05-07T19:43:01.4947779Z [INSTALL] Installing system package(s): hostname lshw ... 2025-05-07T19:43:01.4975644Z [EXEC] [ATTEMPT 0/3] + sudo yum install -y hostname lshw 2025-05-07T19:43:01.7222900Z Last metadata expiration check: 0:00:20 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:43:01.7743263Z Dependencies resolved. 2025-05-07T19:43:01.7909599Z ================================================================================ 2025-05-07T19:43:01.7910101Z Package Arch Version Repository Size 2025-05-07T19:43:01.7910528Z ================================================================================ 2025-05-07T19:43:01.7910851Z Installing: 2025-05-07T19:43:01.7911181Z hostname x86_64 3.23-4.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:43:01.7911648Z lshw x86_64 B.02.19.2-7.amzn2023.0.3 amazonlinux 319 k 2025-05-07T19:43:01.7911946Z 2025-05-07T19:43:01.7912038Z Transaction Summary 2025-05-07T19:43:01.7912284Z ================================================================================ 2025-05-07T19:43:01.7912621Z Install 2 Packages 2025-05-07T19:43:01.7912757Z 2025-05-07T19:43:01.7912883Z Total download size: 347 k 2025-05-07T19:43:01.7913137Z Installed size: 883 k 2025-05-07T19:43:01.7913401Z Downloading Packages: 2025-05-07T19:43:02.0758233Z (1/2): hostname-3.23-4.amzn2023.0.3.x86_64.rpm 1.5 MB/s | 28 kB 00:00 2025-05-07T19:43:02.0841994Z (2/2): lshw-B.02.19.2-7.amzn2023.0.3.x86_64.rpm 12 MB/s | 319 kB 00:00 2025-05-07T19:43:02.0852837Z -------------------------------------------------------------------------------- 2025-05-07T19:43:02.0855746Z Total 1.2 MB/s | 347 kB 00:00 2025-05-07T19:43:02.1101669Z Running transaction check 2025-05-07T19:43:02.1156736Z Transaction check succeeded. 2025-05-07T19:43:02.1157171Z Running transaction test 2025-05-07T19:43:02.1322814Z Transaction test succeeded. 2025-05-07T19:43:02.1323711Z Running transaction 2025-05-07T19:43:02.1599472Z Preparing : 1/1 2025-05-07T19:43:02.1670122Z Installing : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:02.1700895Z Installing : hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:03.2239629Z Running scriptlet: hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:03.2240626Z Verifying : hostname-3.23-4.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:03.2611734Z Verifying : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:03.2612763Z 2025-05-07T19:43:03.2613005Z Installed: 2025-05-07T19:43:03.2613985Z hostname-3.23-4.amzn2023.0.3.x86_64 lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2025-05-07T19:43:03.2614986Z 2025-05-07T19:43:03.2615220Z Complete! 2025-05-07T19:43:03.3064963Z + hostname 2025-05-07T19:43:03.3065706Z 2025-05-07T19:43:03.3073018Z 2aa0e203fee3 2025-05-07T19:43:03.3073482Z 2025-05-07T19:43:03.3074230Z + sudo lshw -C display 2025-05-07T19:43:03.3074687Z 2025-05-07T19:43:03.5044503Z *-display UNCLAIMED 2025-05-07T19:43:03.5045106Z description: VGA compatible controller 2025-05-07T19:43:03.5045472Z product: Amazon.com, Inc. 2025-05-07T19:43:03.5045767Z vendor: Amazon.com, Inc. 2025-05-07T19:43:03.5046054Z physical id: 3 2025-05-07T19:43:03.5046298Z bus info: pci@0000:00:03.0 2025-05-07T19:43:03.5046581Z version: 00 2025-05-07T19:43:03.5046947Z width: 32 bits 2025-05-07T19:43:03.5047168Z clock: 33MHz 2025-05-07T19:43:03.5047436Z capabilities: vga_controller bus_master 2025-05-07T19:43:03.5047751Z configuration: latency=0 2025-05-07T19:43:03.5048098Z resources: memory:c0000000-c03fffff memory:c0000-dffff 2025-05-07T19:43:03.5066203Z 2025-05-07T19:43:03.5066809Z ################################################################################ 2025-05-07T19:43:03.5067536Z [INFO] Printing NVIDIA GPU info ... 2025-05-07T19:43:03.5175357Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:03.5202450Z which: no nvidia-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:03.5203155Z [CHECK] nvidia-smi not found 2025-05-07T19:43:03.5203500Z ################################################################################ 2025-05-07T19:43:03.5203932Z [INFO] Printing AMD GPU info ... 2025-05-07T19:43:03.5342351Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:03.5365392Z which: no rocminfo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:03.5365937Z [CHECK] rocminfo not found 2025-05-07T19:43:03.5373120Z which: no rocm-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:03.5374524Z [CHECK] rocm-smi not found 2025-05-07T19:43:03.5440248Z ##[group]Run . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:03.5440766Z . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:03.5441352Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:03.5441724Z env: 2025-05-07T19:43:03.5441969Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:03.5442419Z BUILD_ENV: build_binary 2025-05-07T19:43:03.5442847Z BUILD_TARGET: genai 2025-05-07T19:43:03.5443136Z BUILD_VARIANT: cuda 2025-05-07T19:43:03.5443430Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:03.5443762Z ##[endgroup] 2025-05-07T19:43:03.9204338Z ################################################################################ 2025-05-07T19:43:03.9204770Z # Setup Miniconda 2025-05-07T19:43:03.9205012Z # 2025-05-07T19:43:03.9220715Z # [2025-05-07T19:43:03.921Z] + setup_miniconda /github/home/miniconda 2025-05-07T19:43:03.9221235Z ################################################################################ 2025-05-07T19:43:03.9221593Z 2025-05-07T19:43:03.9245467Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:04.0116527Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:04.0117696Z + mkdir -p /github/home/miniconda 2025-05-07T19:43:04.0118282Z 2025-05-07T19:43:04.0126954Z 2025-05-07T19:43:04.0128088Z [SETUP] Downloading the Miniconda installer ... 2025-05-07T19:43:04.0151311Z [EXEC] [ATTEMPT 0/3] + wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh 2025-05-07T19:43:05.4832163Z [SETUP] Installing Miniconda ... 2025-05-07T19:43:05.4833220Z + bash miniconda.sh -b -p /github/home/miniconda -u 2025-05-07T19:43:05.4834005Z 2025-05-07T19:43:05.4983060Z PREFIX=/github/home/miniconda 2025-05-07T19:43:05.8562999Z Unpacking payload ... 2025-05-07T19:43:06.3351243Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:07.0062010Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:08.8581879Z 2025-05-07T19:43:08.8582569Z Installing base environment... 2025-05-07T19:43:08.8582857Z 2025-05-07T19:43:09.8476240Z Preparing transaction: ...working... done 2025-05-07T19:43:12.6794540Z Executing transaction: ...working... done 2025-05-07T19:43:13.2274121Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:13.2964549Z installation finished. 2025-05-07T19:43:13.2968205Z 2025-05-07T19:43:13.2968899Z + rm -f miniconda.sh 2025-05-07T19:43:13.2969697Z 2025-05-07T19:43:13.3162628Z 2025-05-07T19:43:13.3163619Z [SETUP] Reloading the bash configuration ... 2025-05-07T19:43:13.3164593Z + /github/home/miniconda/bin/conda init bash 2025-05-07T19:43:13.3164820Z 2025-05-07T19:43:13.6888961Z no change /github/home/miniconda/condabin/conda 2025-05-07T19:43:13.6891050Z no change /github/home/miniconda/bin/conda 2025-05-07T19:43:13.6892217Z no change /github/home/miniconda/bin/conda-env 2025-05-07T19:43:13.6893296Z no change /github/home/miniconda/bin/activate 2025-05-07T19:43:13.6894401Z no change /github/home/miniconda/bin/deactivate 2025-05-07T19:43:13.6895600Z no change /github/home/miniconda/etc/profile.d/conda.sh 2025-05-07T19:43:13.6896416Z no change /github/home/miniconda/etc/fish/conf.d/conda.fish 2025-05-07T19:43:13.6896886Z no change /github/home/miniconda/shell/condabin/Conda.psm1 2025-05-07T19:43:13.6897336Z no change /github/home/miniconda/shell/condabin/conda-hook.ps1 2025-05-07T19:43:13.6897900Z no change /github/home/miniconda/lib/python3.13/site-packages/xontrib/conda.xsh 2025-05-07T19:43:13.6898723Z no change /github/home/miniconda/etc/profile.d/conda.csh 2025-05-07T19:43:13.6899148Z modified /github/home/.bashrc 2025-05-07T19:43:13.6899340Z 2025-05-07T19:43:13.6899589Z ==> For changes to take effect, close and re-open your current shell. <== 2025-05-07T19:43:13.6899911Z 2025-05-07T19:43:13.7507542Z 2025-05-07T19:43:13.7508473Z + . /github/home/.bashrc 2025-05-07T19:43:13.7509410Z 2025-05-07T19:43:14.5492171Z 2025-05-07T19:43:14.5493061Z [SETUP] Installing libmamba-solver (required since Anaconda 2024.02-1) and libarchive ... 2025-05-07T19:43:14.5522797Z [EXEC] [ATTEMPT 0/3] + conda install --solver=classic -c conda-forge --override-channels -y conda-libmamba-solver libmamba libmambapy libarchive 2025-05-07T19:43:26.2204799Z Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:27.6910301Z Solving environment: \ | / - \ | / - \ | / done 2025-05-07T19:43:27.7809269Z 2025-05-07T19:43:27.7810195Z ## Package Plan ## 2025-05-07T19:43:27.7811129Z 2025-05-07T19:43:27.7811746Z environment location: /github/home/miniconda 2025-05-07T19:43:27.7813100Z 2025-05-07T19:43:27.7813592Z added / updated specs: 2025-05-07T19:43:27.7814485Z - conda-libmamba-solver 2025-05-07T19:43:27.7815259Z - libarchive 2025-05-07T19:43:27.7815947Z - libmamba 2025-05-07T19:43:27.7816170Z - libmambapy 2025-05-07T19:43:27.7816309Z 2025-05-07T19:43:27.7816313Z 2025-05-07T19:43:27.7816464Z The following packages will be downloaded: 2025-05-07T19:43:27.7816696Z 2025-05-07T19:43:27.7816820Z package | build 2025-05-07T19:43:27.7817457Z ---------------------------|----------------- 2025-05-07T19:43:27.7818173Z ca-certificates-2025.4.26 | hbd8a1cb_0 149 KB conda-forge 2025-05-07T19:43:27.7818711Z certifi-2025.4.26 | pyhd8ed1ab_0 154 KB conda-forge 2025-05-07T19:43:27.7819520Z conda-25.3.1 | py313h78bf25f_1 1.1 MB conda-forge 2025-05-07T19:43:27.7820023Z conda-libmamba-solver-25.4.0| pyhd8ed1ab_0 41 KB conda-forge 2025-05-07T19:43:27.7820534Z ------------------------------------------------------------ 2025-05-07T19:43:27.7820906Z Total: 1.4 MB 2025-05-07T19:43:27.7821163Z 2025-05-07T19:43:27.7821291Z The following packages will be UPDATED: 2025-05-07T19:43:27.7821513Z 2025-05-07T19:43:27.7827594Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:43:27.7828503Z conda pkgs/main::conda-25.3.1-py313h06a4308~ --> conda-forge::conda-25.3.1-py313h78bf25f_1 2025-05-07T19:43:27.7828976Z 2025-05-07T19:43:27.7829215Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:43:27.7829566Z 2025-05-07T19:43:27.7829912Z certifi pkgs/main/linux-64::certifi-2025.4.26~ --> conda-forge/noarch::certifi-2025.4.26-pyhd8ed1ab_0 2025-05-07T19:43:27.7830820Z conda-libmamba-so~ pkgs/main::conda-libmamba-solver-25.4~ --> conda-forge::conda-libmamba-solver-25.4.0-pyhd8ed1ab_0 2025-05-07T19:43:27.7831345Z 2025-05-07T19:43:27.7831370Z 2025-05-07T19:43:27.7831374Z 2025-05-07T19:43:27.7831530Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:27.7832036Z conda-25.3.1 | 1.1 MB | | 0% 2025-05-07T19:43:27.7832478Z 2025-05-07T19:43:27.7832810Z certifi-2025.4.26 | 154 KB | | 0%  2025-05-07T19:43:27.7833070Z 2025-05-07T19:43:27.7833074Z 2025-05-07T19:43:27.7833336Z ca-certificates-2025 | 149 KB | | 0%  2025-05-07T19:43:27.7833613Z 2025-05-07T19:43:27.7833617Z 2025-05-07T19:43:27.7833871Z 2025-05-07T19:43:27.8526384Z conda-libmamba-solve | 41 KB | | 0%  2025-05-07T19:43:27.8555667Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:27.8556651Z 2025-05-07T19:43:27.8556658Z 2025-05-07T19:43:27.8556663Z 2025-05-07T19:43:27.8557274Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:27.8557756Z 2025-05-07T19:43:27.8764775Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:27.8765203Z 2025-05-07T19:43:27.8792033Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:27.8793376Z 2025-05-07T19:43:27.8793382Z 2025-05-07T19:43:27.8793386Z 2025-05-07T19:43:27.9166116Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:27.9166599Z 2025-05-07T19:43:27.9166634Z 2025-05-07T19:43:27.9204664Z ca-certificates-2025 | 149 KB | # | 11%  2025-05-07T19:43:27.9206144Z 2025-05-07T19:43:27.9206159Z 2025-05-07T19:43:27.9300102Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:27.9301568Z 2025-05-07T19:43:27.9301582Z 2025-05-07T19:43:27.9707746Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:27.9709641Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:27.9711446Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:27.9713223Z 2025-05-07T19:43:27.9713856Z 2025-05-07T19:43:27.9714393Z  2025-05-07T19:43:27.9715054Z 2025-05-07T19:43:27.9715066Z 2025-05-07T19:43:27.9715564Z  2025-05-07T19:43:27.9716191Z 2025-05-07T19:43:27.9716203Z 2025-05-07T19:43:27.9716214Z 2025-05-07T19:43:27.9716844Z  done 2025-05-07T19:43:28.0724593Z Preparing transaction: \ done 2025-05-07T19:43:28.1732624Z Verifying transaction: / done 2025-05-07T19:43:29.4764219Z Executing transaction: \ | / - \ | / - \ | / - \ done 2025-05-07T19:43:31.0818052Z [SETUP] Updating Miniconda base packages ... 2025-05-07T19:43:31.0844459Z [EXEC] [ATTEMPT 0/3] + conda update -n base -c defaults --update-deps -y conda 2025-05-07T19:43:31.8153372Z Channels: 2025-05-07T19:43:31.8153827Z - defaults 2025-05-07T19:43:31.8154144Z Platform: linux-64 2025-05-07T19:43:32.8971938Z Collecting package metadata (repodata.json): - \ | / - \ done 2025-05-07T19:43:33.0270769Z Solving environment: / - Channels: 2025-05-07T19:43:33.0271929Z - defaults 2025-05-07T19:43:33.0272558Z Platform: linux-64 2025-05-07T19:43:33.3103192Z Collecting package metadata (repodata.json): | / - \ done 2025-05-07T19:43:33.5289331Z Solving environment: / - \ done 2025-05-07T19:43:33.6498856Z | done 2025-05-07T19:43:33.7133938Z 2025-05-07T19:43:33.7134569Z ## Package Plan ## 2025-05-07T19:43:33.7134985Z 2025-05-07T19:43:33.7135179Z environment location: /github/home/miniconda 2025-05-07T19:43:33.7135433Z 2025-05-07T19:43:33.7135541Z added / updated specs: 2025-05-07T19:43:33.7135832Z - conda 2025-05-07T19:43:33.7135972Z 2025-05-07T19:43:33.7135976Z 2025-05-07T19:43:33.7136107Z The following packages will be downloaded: 2025-05-07T19:43:33.7136369Z 2025-05-07T19:43:33.7136497Z package | build 2025-05-07T19:43:33.7136852Z ---------------------------|----------------- 2025-05-07T19:43:33.7137254Z pip-25.1 | pyhc872135_2 1.3 MB 2025-05-07T19:43:33.7137701Z tzdata-2025b | h04d1e81_0 116 KB 2025-05-07T19:43:33.7138107Z ------------------------------------------------------------ 2025-05-07T19:43:33.7138504Z Total: 1.4 MB 2025-05-07T19:43:33.7138737Z 2025-05-07T19:43:33.7138866Z The following packages will be UPDATED: 2025-05-07T19:43:33.7139117Z 2025-05-07T19:43:33.7139713Z pip pkgs/main/linux-64::pip-25.0-py313h06~ --> pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:33.7140308Z tzdata 2025a-h04d1e81_0 --> 2025b-h04d1e81_0 2025-05-07T19:43:33.7140582Z 2025-05-07T19:43:33.7140587Z 2025-05-07T19:43:33.7140591Z 2025-05-07T19:43:33.7140741Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:33.7141137Z pip-25.1 | 1.3 MB | | 0% 2025-05-07T19:43:33.7141377Z 2025-05-07T19:43:33.7705905Z tzdata-2025b | 116 KB | | 0%  2025-05-07T19:43:33.7789998Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:33.7790760Z 2025-05-07T19:43:33.9664091Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:33.9664867Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:33.9708654Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:33.9708974Z 2025-05-07T19:43:33.9709555Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:33.9709988Z 2025-05-07T19:43:33.9710246Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:33.9710609Z 2025-05-07T19:43:33.9710838Z 2025-05-07T19:43:33.9711052Z  done 2025-05-07T19:43:34.0721280Z Preparing transaction: - done 2025-05-07T19:43:34.1730913Z Verifying transaction: | done 2025-05-07T19:43:36.1768246Z Executing transaction: - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:36.7223476Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:43:36.7223879Z + conda clean --packages --tarball -y 2025-05-07T19:43:36.7224137Z 2025-05-07T19:43:37.1608312Z Will remove 99 (117.8 MB) tarball(s). 2025-05-07T19:43:37.1609257Z Will remove 11 (16.0 MB) package(s). 2025-05-07T19:43:37.2171642Z 2025-05-07T19:43:37.2177073Z + conda clean --all -y 2025-05-07T19:43:37.2177562Z 2025-05-07T19:43:37.6644529Z There are no unused tarball(s) to remove. 2025-05-07T19:43:37.6645525Z Will remove 1 index cache(s). 2025-05-07T19:43:37.6646265Z There are no unused package(s) to remove. 2025-05-07T19:43:37.6646906Z There are no tempfile(s) to remove. 2025-05-07T19:43:37.6647361Z There are no logfile(s) to remove. 2025-05-07T19:43:37.7195281Z 2025-05-07T19:43:37.7195804Z + conda info 2025-05-07T19:43:37.7196231Z 2025-05-07T19:43:38.2867795Z 2025-05-07T19:43:38.2868160Z active environment : base 2025-05-07T19:43:38.2868841Z active env location : /github/home/miniconda 2025-05-07T19:43:38.2869268Z shell level : 1 2025-05-07T19:43:38.2869577Z user config file : /github/home/.condarc 2025-05-07T19:43:38.2870012Z populated config files : /github/home/miniconda/.condarc 2025-05-07T19:43:38.2870524Z conda version : 25.3.1 2025-05-07T19:43:38.2870853Z conda-build version : not installed 2025-05-07T19:43:38.2871190Z python version : 3.13.2.final.0 2025-05-07T19:43:38.2871537Z solver : libmamba (default) 2025-05-07T19:43:38.2872015Z virtual packages : __archspec=1=cascadelake 2025-05-07T19:43:38.2872339Z __conda=25.3.1=0 2025-05-07T19:43:38.2872664Z __glibc=2.34=0 2025-05-07T19:43:38.2872961Z __linux=6.1.130=0 2025-05-07T19:43:38.2873267Z __unix=0=0 2025-05-07T19:43:38.2873601Z base environment : /github/home/miniconda (writable) 2025-05-07T19:43:38.2874023Z conda av data dir : /github/home/miniconda/etc/conda 2025-05-07T19:43:38.2874379Z conda av metadata url : None 2025-05-07T19:43:38.2874734Z channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 2025-05-07T19:43:38.2875167Z https://repo.anaconda.com/pkgs/main/noarch 2025-05-07T19:43:38.2875538Z https://repo.anaconda.com/pkgs/r/linux-64 2025-05-07T19:43:38.2876174Z https://repo.anaconda.com/pkgs/r/noarch 2025-05-07T19:43:38.2876534Z package cache : /github/home/miniconda/pkgs 2025-05-07T19:43:38.2876879Z /github/home/.conda/pkgs 2025-05-07T19:43:38.2877210Z envs directories : /github/home/miniconda/envs 2025-05-07T19:43:38.2877551Z /github/home/.conda/envs 2025-05-07T19:43:38.2877860Z platform : linux-64 2025-05-07T19:43:38.2878698Z user-agent : conda/25.3.1 requests/2.32.3 CPython/3.13.2 Linux/6.1.130-139.222.amzn2023.x86_64 amzn/2023.7.20250428 glibc/2.34 solver/libmamba conda-libmamba-solver/25.4.0 libmambapy/2.0.5 aau/0.7.0 c/. s/. e/. 2025-05-07T19:43:38.2879559Z UID:GID : 0:0 2025-05-07T19:43:38.2879803Z netrc file : None 2025-05-07T19:43:38.2880068Z offline mode : False 2025-05-07T19:43:38.2880236Z 2025-05-07T19:43:38.3468297Z 2025-05-07T19:43:38.3469406Z [SETUP] Exporting Miniconda variables ... 2025-05-07T19:43:38.3472124Z [SETUP] Saving Miniconda variables to /__w/_temp/_runner_file_commands/add_path_d2132a5e-d632-470f-94c1-e75df7bb2e55 ... 2025-05-07T19:43:38.3474159Z [SETUP] Successfully set up Miniconda at /github/home/miniconda 2025-05-07T19:43:38.3632628Z ##[group]Run . $PRELUDE; create_conda_environment $BUILD_ENV 3.11 2025-05-07T19:43:38.3633129Z . $PRELUDE; create_conda_environment $BUILD_ENV 3.11 2025-05-07T19:43:38.3633860Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:38.3634175Z env: 2025-05-07T19:43:38.3634385Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:38.3634685Z BUILD_ENV: build_binary 2025-05-07T19:43:38.3634914Z BUILD_TARGET: genai 2025-05-07T19:43:38.3635144Z BUILD_VARIANT: cuda 2025-05-07T19:43:38.3635363Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:38.3635607Z ##[endgroup] 2025-05-07T19:43:38.7948821Z ################################################################################ 2025-05-07T19:43:38.7949449Z # Create Conda Environment 2025-05-07T19:43:38.7949728Z # 2025-05-07T19:43:38.7969248Z # [2025-05-07T19:43:38.796Z] + create_conda_environment build_binary 3.11 2025-05-07T19:43:38.7970051Z ################################################################################ 2025-05-07T19:43:38.7970557Z 2025-05-07T19:43:38.7991838Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:38.8927855Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:38.8929716Z [SETUP] Listing existing Conda environments ... 2025-05-07T19:43:38.8930717Z + conda info --envs 2025-05-07T19:43:38.8931133Z 2025-05-07T19:43:39.4602077Z 2025-05-07T19:43:39.4603049Z # conda environments: 2025-05-07T19:43:39.4604390Z # 2025-05-07T19:43:39.4605160Z base /github/home/miniconda 2025-05-07T19:43:39.4605850Z 2025-05-07T19:43:39.5192231Z 2025-05-07T19:43:39.5193634Z [SETUP] Deleting the prefix directory if it exists ... 2025-05-07T19:43:41.1332169Z + rm -rf /github/home/miniconda/envs/build_binary 2025-05-07T19:43:41.1333497Z 2025-05-07T19:43:41.1348905Z 2025-05-07T19:43:41.1363574Z [SETUP] Creating new Conda environment (Python 3.11) ... 2025-05-07T19:43:41.1390412Z [EXEC] [ATTEMPT 0/3] + conda create -y -n build_binary python=3.11 2025-05-07T19:43:41.7230647Z Channels: 2025-05-07T19:43:41.7230968Z - defaults 2025-05-07T19:43:41.7231242Z Platform: linux-64 2025-05-07T19:43:43.0892051Z Collecting package metadata (repodata.json): - \ | / - \ | / done 2025-05-07T19:43:43.1898985Z Solving environment: \ done 2025-05-07T19:43:43.2190646Z 2025-05-07T19:43:43.2191002Z ## Package Plan ## 2025-05-07T19:43:43.2191218Z 2025-05-07T19:43:43.2191859Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:43:43.2192222Z 2025-05-07T19:43:43.2192341Z added / updated specs: 2025-05-07T19:43:43.2192668Z - python=3.11 2025-05-07T19:43:43.2192822Z 2025-05-07T19:43:43.2192829Z 2025-05-07T19:43:43.2193005Z The following packages will be downloaded: 2025-05-07T19:43:43.2193250Z 2025-05-07T19:43:43.2193384Z package | build 2025-05-07T19:43:43.2193779Z ---------------------------|----------------- 2025-05-07T19:43:43.2194212Z _libgcc_mutex-0.1 | main 3 KB 2025-05-07T19:43:43.2194706Z _openmp_mutex-5.1 | 1_gnu 21 KB 2025-05-07T19:43:43.2195165Z ca-certificates-2025.2.25 | h06a4308_0 129 KB 2025-05-07T19:43:43.2195643Z python-3.11.11 | he870216_0 32.9 MB 2025-05-07T19:43:43.2196109Z setuptools-78.1.1 | py311h06a4308_0 2.3 MB 2025-05-07T19:43:43.2196542Z wheel-0.45.1 | py311h06a4308_0 151 KB 2025-05-07T19:43:43.2196968Z ------------------------------------------------------------ 2025-05-07T19:43:43.2197337Z Total: 35.4 MB 2025-05-07T19:43:43.2197600Z 2025-05-07T19:43:43.2197744Z The following NEW packages will be INSTALLED: 2025-05-07T19:43:43.2197987Z 2025-05-07T19:43:43.2198229Z _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main 2025-05-07T19:43:43.2198746Z _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 2025-05-07T19:43:43.2199567Z bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 2025-05-07T19:43:43.2200111Z ca-certificates pkgs/main/linux-64::ca-certificates-2025.2.25-h06a4308_0 2025-05-07T19:43:43.2200736Z ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 2025-05-07T19:43:43.2201241Z libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 2025-05-07T19:43:43.2201741Z libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 2025-05-07T19:43:43.2202321Z libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 2025-05-07T19:43:43.2202840Z libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 2025-05-07T19:43:43.2203395Z libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 2025-05-07T19:43:43.2203855Z ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 2025-05-07T19:43:43.2204354Z openssl pkgs/main/linux-64::openssl-3.0.16-h5eee18b_0 2025-05-07T19:43:43.2204995Z pip pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:43.2205431Z python pkgs/main/linux-64::python-3.11.11-he870216_0 2025-05-07T19:43:43.2205930Z readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 2025-05-07T19:43:43.2206447Z setuptools pkgs/main/linux-64::setuptools-78.1.1-py311h06a4308_0 2025-05-07T19:43:43.2206988Z sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 2025-05-07T19:43:43.2207437Z tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0 2025-05-07T19:43:43.2207850Z tzdata pkgs/main/noarch::tzdata-2025b-h04d1e81_0 2025-05-07T19:43:43.2208339Z wheel pkgs/main/linux-64::wheel-0.45.1-py311h06a4308_0 2025-05-07T19:43:43.2208763Z xz pkgs/main/linux-64::xz-5.6.4-h5eee18b_1 2025-05-07T19:43:43.2209190Z zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 2025-05-07T19:43:43.2209458Z 2025-05-07T19:43:43.2209462Z 2025-05-07T19:43:43.2209466Z 2025-05-07T19:43:43.2209660Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:43.2210066Z python-3.11.11 | 32.9 MB | | 0% 2025-05-07T19:43:43.2210346Z 2025-05-07T19:43:43.2210671Z setuptools-78.1.1 | 2.3 MB | | 0%  2025-05-07T19:43:43.2210939Z 2025-05-07T19:43:43.2210943Z 2025-05-07T19:43:43.2211197Z wheel-0.45.1 | 151 KB | | 0%  2025-05-07T19:43:43.2211454Z 2025-05-07T19:43:43.2211458Z 2025-05-07T19:43:43.2211462Z 2025-05-07T19:43:43.2244448Z ca-certificates-2025 | 129 KB | | 0%  2025-05-07T19:43:43.2245014Z 2025-05-07T19:43:43.2245022Z 2025-05-07T19:43:43.2245028Z 2025-05-07T19:43:43.2245035Z 2025-05-07T19:43:43.2245446Z _openmp_mutex-5.1 | 21 KB | | 0%  2025-05-07T19:43:43.2245835Z 2025-05-07T19:43:43.2245839Z 2025-05-07T19:43:43.2245844Z 2025-05-07T19:43:43.2245876Z 2025-05-07T19:43:43.2245881Z 2025-05-07T19:43:43.2685131Z _libgcc_mutex-0.1 | 3 KB | | 0%  2025-05-07T19:43:43.2685478Z 2025-05-07T19:43:43.2685483Z 2025-05-07T19:43:43.2799875Z wheel-0.45.1 | 151 KB | ########## | 100%  2025-05-07T19:43:43.2800368Z 2025-05-07T19:43:43.2800420Z 2025-05-07T19:43:43.2800487Z 2025-05-07T19:43:43.2844395Z ca-certificates-2025 | 129 KB | ########## | 100%  2025-05-07T19:43:43.2844706Z 2025-05-07T19:43:43.2844711Z 2025-05-07T19:43:43.2844715Z 2025-05-07T19:43:43.2844718Z 2025-05-07T19:43:43.2868304Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:43.2868698Z 2025-05-07T19:43:43.2868848Z 2025-05-07T19:43:43.2868861Z 2025-05-07T19:43:43.2868865Z 2025-05-07T19:43:43.2869105Z 2025-05-07T19:43:43.3042925Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:43.3043735Z 2025-05-07T19:43:43.3043753Z 2025-05-07T19:43:43.3043760Z 2025-05-07T19:43:43.3049728Z ca-certificates-2025 | 129 KB | ########## | 100%  2025-05-07T19:43:43.3050104Z 2025-05-07T19:43:43.3160611Z setuptools-78.1.1 | 2.3 MB | ########## | 100%  2025-05-07T19:43:43.3160922Z 2025-05-07T19:43:43.3160926Z 2025-05-07T19:43:43.3160930Z 2025-05-07T19:43:43.3160933Z 2025-05-07T19:43:43.3160937Z 2025-05-07T19:43:43.3196626Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:43.3454188Z python-3.11.11 | 32.9 MB | 9 | 10% 2025-05-07T19:43:43.3454529Z 2025-05-07T19:43:43.3454534Z 2025-05-07T19:43:43.3454540Z 2025-05-07T19:43:43.3454550Z 2025-05-07T19:43:43.3458347Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:43.3458670Z 2025-05-07T19:43:43.3458677Z 2025-05-07T19:43:43.3458686Z 2025-05-07T19:43:43.3458690Z 2025-05-07T19:43:43.3512619Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:43.3512935Z 2025-05-07T19:43:43.3513063Z 2025-05-07T19:43:43.3513745Z wheel-0.45.1 | 151 KB | ########## | 100%  2025-05-07T19:43:43.3514267Z 2025-05-07T19:43:43.3514292Z 2025-05-07T19:43:43.4196423Z wheel-0.45.1 | 151 KB | ########## | 100%  2025-05-07T19:43:43.5707267Z python-3.11.11 | 32.9 MB | ####7 | 48% 2025-05-07T19:43:43.5707543Z 2025-05-07T19:43:43.5707870Z setuptools-78.1.1 | 2.3 MB | ########## | 100%  2025-05-07T19:43:43.5708146Z 2025-05-07T19:43:43.6200727Z setuptools-78.1.1 | 2.3 MB | ########## | 100%  2025-05-07T19:43:43.6201146Z python-3.11.11 | 32.9 MB | ########## | 100% 2025-05-07T19:43:44.1731087Z python-3.11.11 | 32.9 MB | ########## | 100% 2025-05-07T19:43:44.1734677Z python-3.11.11 | 32.9 MB | ########## | 100% 2025-05-07T19:43:44.1735069Z 2025-05-07T19:43:44.1735282Z 2025-05-07T19:43:44.1735708Z  2025-05-07T19:43:44.1735947Z 2025-05-07T19:43:44.1735952Z 2025-05-07T19:43:44.1736150Z  2025-05-07T19:43:44.1736381Z 2025-05-07T19:43:44.1736385Z 2025-05-07T19:43:44.1736389Z 2025-05-07T19:43:44.1736585Z  2025-05-07T19:43:44.1736807Z 2025-05-07T19:43:44.1736811Z 2025-05-07T19:43:44.1736815Z 2025-05-07T19:43:44.1736818Z 2025-05-07T19:43:44.1737035Z  2025-05-07T19:43:44.1737289Z 2025-05-07T19:43:44.1737294Z 2025-05-07T19:43:44.1737297Z 2025-05-07T19:43:44.1737301Z 2025-05-07T19:43:44.1737304Z 2025-05-07T19:43:44.1737502Z  done 2025-05-07T19:43:44.3851065Z Preparing transaction: / - done 2025-05-07T19:43:45.7414136Z Verifying transaction: | / - \ | / - \ | / - \ | done 2025-05-07T19:43:47.8511296Z Executing transaction: - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:47.8550990Z # 2025-05-07T19:43:47.8551736Z # To activate this environment, use 2025-05-07T19:43:47.8552593Z # 2025-05-07T19:43:47.8553151Z # $ conda activate build_binary 2025-05-07T19:43:47.8553936Z # 2025-05-07T19:43:47.8554534Z # To deactivate an active environment, use 2025-05-07T19:43:47.8555391Z # 2025-05-07T19:43:47.8555907Z # $ conda deactivate 2025-05-07T19:43:47.8556377Z 2025-05-07T19:43:47.9405371Z [SETUP] Upgrading PIP to latest ... 2025-05-07T19:43:47.9438146Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --upgrade pip 2025-05-07T19:43:50.8558730Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:43:50.8560216Z 2025-05-07T19:43:50.8560994Z Requirement already satisfied: pip in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (25.1) 2025-05-07T19:43:50.8561660Z Collecting pip 2025-05-07T19:43:50.8561977Z Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB) 2025-05-07T19:43:50.8562528Z Downloading pip-25.1.1-py3-none-any.whl (1.8 MB) 2025-05-07T19:43:50.8563709Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 62.1 MB/s eta 0:00:00 2025-05-07T19:43:50.8564105Z Installing collected packages: pip 2025-05-07T19:43:50.8564424Z Attempting uninstall: pip 2025-05-07T19:43:50.8564753Z Found existing installation: pip 25.1 2025-05-07T19:43:50.8565118Z Uninstalling pip-25.1: 2025-05-07T19:43:50.8565414Z Successfully uninstalled pip-25.1 2025-05-07T19:43:50.8565779Z Successfully installed pip-25.1.1 2025-05-07T19:43:50.8565986Z 2025-05-07T19:43:50.9159951Z [SETUP] Upgrading pyOpenSSL ... 2025-05-07T19:43:50.9187319Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y pyOpenSSL>22.1.0 2025-05-07T19:43:51.5815624Z Channels: 2025-05-07T19:43:51.5816428Z - conda-forge 2025-05-07T19:43:51.5816802Z Platform: linux-64 2025-05-07T19:44:01.3201737Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:44:03.2427016Z Solving environment: | / - \ | done 2025-05-07T19:44:03.2885705Z 2025-05-07T19:44:03.2886168Z ## Package Plan ## 2025-05-07T19:44:03.2886775Z 2025-05-07T19:44:03.2887370Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:03.2888292Z 2025-05-07T19:44:03.2888615Z added / updated specs: 2025-05-07T19:44:03.2889382Z - pyopenssl[version='>22.1.0'] 2025-05-07T19:44:03.2889952Z 2025-05-07T19:44:03.2889966Z 2025-05-07T19:44:03.2890354Z The following packages will be downloaded: 2025-05-07T19:44:03.2891012Z 2025-05-07T19:44:03.2891349Z package | build 2025-05-07T19:44:03.2892329Z ---------------------------|----------------- 2025-05-07T19:44:03.2893497Z cffi-1.17.1 | py311hf29c0ef_0 295 KB conda-forge 2025-05-07T19:44:03.2894886Z cryptography-44.0.3 | py311hafd3f86_0 1.5 MB conda-forge 2025-05-07T19:44:03.2896263Z libgcc-15.1.0 | h767d61c_2 810 KB conda-forge 2025-05-07T19:44:03.2897524Z libgcc-ng-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:44:03.2898918Z libgomp-15.1.0 | h767d61c_2 442 KB conda-forge 2025-05-07T19:44:03.2899369Z openssl-3.5.0 | h7b32b05_1 3.0 MB conda-forge 2025-05-07T19:44:03.2899968Z pycparser-2.22 | pyh29332c3_1 108 KB conda-forge 2025-05-07T19:44:03.2900569Z pyopenssl-25.0.0 | pyhd8ed1ab_0 120 KB conda-forge 2025-05-07T19:44:03.2901385Z python_abi-3.11 | 2_cp311 5 KB conda-forge 2025-05-07T19:44:03.2901907Z typing-extensions-4.13.2 | h0e9735f_0 88 KB conda-forge 2025-05-07T19:44:03.2902445Z typing_extensions-4.13.2 | pyh29332c3_0 51 KB conda-forge 2025-05-07T19:44:03.2902938Z ------------------------------------------------------------ 2025-05-07T19:44:03.2903314Z Total: 6.4 MB 2025-05-07T19:44:03.2903575Z 2025-05-07T19:44:03.2903718Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:03.2903957Z 2025-05-07T19:44:03.2904220Z cffi conda-forge/linux-64::cffi-1.17.1-py311hf29c0ef_0 2025-05-07T19:44:03.2904758Z cryptography conda-forge/linux-64::cryptography-44.0.3-py311hafd3f86_0 2025-05-07T19:44:03.2905320Z libgcc conda-forge/linux-64::libgcc-15.1.0-h767d61c_2 2025-05-07T19:44:03.2905803Z pycparser conda-forge/noarch::pycparser-2.22-pyh29332c3_1 2025-05-07T19:44:03.2906337Z pyopenssl conda-forge/noarch::pyopenssl-25.0.0-pyhd8ed1ab_0 2025-05-07T19:44:03.2907277Z python_abi conda-forge/linux-64::python_abi-3.11-2_cp311 2025-05-07T19:44:03.2908027Z typing-extensions conda-forge/noarch::typing-extensions-4.13.2-h0e9735f_0 2025-05-07T19:44:03.2908643Z typing_extensions conda-forge/noarch::typing_extensions-4.13.2-pyh29332c3_0 2025-05-07T19:44:03.2908986Z 2025-05-07T19:44:03.2909106Z The following packages will be UPDATED: 2025-05-07T19:44:03.2909343Z 2025-05-07T19:44:03.2909738Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:44:03.2910543Z libgcc-ng pkgs/main::libgcc-ng-11.2.0-h1234567_1 --> conda-forge::libgcc-ng-15.1.0-h69a702a_2 2025-05-07T19:44:03.2911193Z libgomp pkgs/main::libgomp-11.2.0-h1234567_1 --> conda-forge::libgomp-15.1.0-h767d61c_2 2025-05-07T19:44:03.2911859Z openssl pkgs/main::openssl-3.0.16-h5eee18b_0 --> conda-forge::openssl-3.5.0-h7b32b05_1 2025-05-07T19:44:03.2912235Z 2025-05-07T19:44:03.2912239Z 2025-05-07T19:44:03.2912416Z 2025-05-07T19:44:03.2912604Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:03.2912999Z openssl-3.5.0 | 3.0 MB | | 0% 2025-05-07T19:44:03.2913261Z 2025-05-07T19:44:03.2913577Z cryptography-44.0.3 | 1.5 MB | | 0%  2025-05-07T19:44:03.2913832Z 2025-05-07T19:44:03.2913836Z 2025-05-07T19:44:03.2914066Z libgcc-15.1.0 | 810 KB | | 0%  2025-05-07T19:44:03.2914306Z 2025-05-07T19:44:03.2914309Z 2025-05-07T19:44:03.2914313Z 2025-05-07T19:44:03.2925489Z libgomp-15.1.0 | 442 KB | | 0%  2025-05-07T19:44:03.2925822Z 2025-05-07T19:44:03.2925835Z 2025-05-07T19:44:03.2925839Z 2025-05-07T19:44:03.2925843Z 2025-05-07T19:44:03.2930092Z cffi-1.17.1 | 295 KB | | 0%  2025-05-07T19:44:03.2930363Z 2025-05-07T19:44:03.2930367Z 2025-05-07T19:44:03.2930371Z 2025-05-07T19:44:03.2930405Z 2025-05-07T19:44:03.2930416Z 2025-05-07T19:44:03.2931010Z pyopenssl-25.0.0 | 120 KB | | 0%  2025-05-07T19:44:03.2931306Z 2025-05-07T19:44:03.2931310Z 2025-05-07T19:44:03.2931313Z 2025-05-07T19:44:03.2931317Z 2025-05-07T19:44:03.2931328Z 2025-05-07T19:44:03.2931361Z 2025-05-07T19:44:03.2932108Z pycparser-2.22 | 108 KB | | 0%  2025-05-07T19:44:03.2932408Z 2025-05-07T19:44:03.2932423Z 2025-05-07T19:44:03.2932427Z 2025-05-07T19:44:03.2932430Z 2025-05-07T19:44:03.2932434Z 2025-05-07T19:44:03.2932437Z 2025-05-07T19:44:03.2932441Z 2025-05-07T19:44:03.2933238Z typing-extensions-4. | 88 KB | | 0%  2025-05-07T19:44:03.2933557Z 2025-05-07T19:44:03.2933560Z 2025-05-07T19:44:03.2933564Z 2025-05-07T19:44:03.2933568Z 2025-05-07T19:44:03.2933571Z 2025-05-07T19:44:03.2933575Z 2025-05-07T19:44:03.2933578Z 2025-05-07T19:44:03.2933588Z 2025-05-07T19:44:03.2934238Z typing_extensions-4. | 51 KB | | 0%  2025-05-07T19:44:03.2934553Z 2025-05-07T19:44:03.2934562Z 2025-05-07T19:44:03.2934565Z 2025-05-07T19:44:03.2934568Z 2025-05-07T19:44:03.2934572Z 2025-05-07T19:44:03.2934575Z 2025-05-07T19:44:03.2934589Z 2025-05-07T19:44:03.2934593Z 2025-05-07T19:44:03.2934596Z 2025-05-07T19:44:03.2935221Z libgcc-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:03.2935505Z 2025-05-07T19:44:03.2935520Z 2025-05-07T19:44:03.2935524Z 2025-05-07T19:44:03.2935527Z 2025-05-07T19:44:03.2935531Z 2025-05-07T19:44:03.2935534Z 2025-05-07T19:44:03.2935538Z 2025-05-07T19:44:03.2935557Z 2025-05-07T19:44:03.2935561Z 2025-05-07T19:44:03.2935564Z 2025-05-07T19:44:03.3446030Z python_abi-3.11 | 5 KB | | 0%  2025-05-07T19:44:03.3446612Z 2025-05-07T19:44:03.3446628Z 2025-05-07T19:44:03.3595806Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.3596101Z 2025-05-07T19:44:03.3596417Z 2025-05-07T19:44:03.3596432Z 2025-05-07T19:44:03.3596472Z 2025-05-07T19:44:03.3605723Z cffi-1.17.1 | 295 KB | ########## | 100%  2025-05-07T19:44:03.3606625Z 2025-05-07T19:44:03.3606640Z 2025-05-07T19:44:03.3606651Z 2025-05-07T19:44:03.3791329Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.3792608Z 2025-05-07T19:44:03.3792623Z 2025-05-07T19:44:03.3792635Z 2025-05-07T19:44:03.3792646Z 2025-05-07T19:44:03.3792656Z 2025-05-07T19:44:03.3895566Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.3896615Z 2025-05-07T19:44:03.3997472Z cryptography-44.0.3 | 1.5 MB | 1 | 1%  2025-05-07T19:44:03.3997961Z 2025-05-07T19:44:03.3997966Z 2025-05-07T19:44:03.3997970Z 2025-05-07T19:44:03.3997974Z 2025-05-07T19:44:03.3997977Z 2025-05-07T19:44:03.3997981Z 2025-05-07T19:44:03.3997985Z 2025-05-07T19:44:03.4021585Z typing-extensions-4. | 88 KB | #8 | 18%  2025-05-07T19:44:03.4022898Z 2025-05-07T19:44:03.4023313Z 2025-05-07T19:44:03.4023341Z 2025-05-07T19:44:03.4023352Z 2025-05-07T19:44:03.4023362Z 2025-05-07T19:44:03.4023373Z 2025-05-07T19:44:03.4023383Z 2025-05-07T19:44:03.4040515Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:03.4041878Z 2025-05-07T19:44:03.4041947Z 2025-05-07T19:44:03.4041959Z 2025-05-07T19:44:03.4041969Z 2025-05-07T19:44:03.4041980Z 2025-05-07T19:44:03.4041990Z 2025-05-07T19:44:03.4066033Z pycparser-2.22 | 108 KB | #4 | 15%  2025-05-07T19:44:03.4066571Z 2025-05-07T19:44:03.4066576Z 2025-05-07T19:44:03.4066580Z 2025-05-07T19:44:03.4066605Z 2025-05-07T19:44:03.4066609Z 2025-05-07T19:44:03.4066612Z 2025-05-07T19:44:03.4113987Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.4115318Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:03.4178099Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:03.4178503Z 2025-05-07T19:44:03.4178570Z 2025-05-07T19:44:03.4178576Z 2025-05-07T19:44:03.4180972Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.4181449Z 2025-05-07T19:44:03.4181463Z 2025-05-07T19:44:03.4181467Z 2025-05-07T19:44:03.4184392Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.4184838Z 2025-05-07T19:44:03.4184868Z 2025-05-07T19:44:03.4184877Z 2025-05-07T19:44:03.4184885Z 2025-05-07T19:44:03.4184890Z 2025-05-07T19:44:03.4184897Z 2025-05-07T19:44:03.4184905Z 2025-05-07T19:44:03.4186064Z 2025-05-07T19:44:03.4188904Z typing_extensions-4. | 51 KB | ###1 | 31%  2025-05-07T19:44:03.4190526Z 2025-05-07T19:44:03.4190537Z 2025-05-07T19:44:03.4190550Z 2025-05-07T19:44:03.4190596Z 2025-05-07T19:44:03.4192020Z cffi-1.17.1 | 295 KB | ########## | 100%  2025-05-07T19:44:03.4192500Z 2025-05-07T19:44:03.4192508Z 2025-05-07T19:44:03.4192515Z 2025-05-07T19:44:03.4192534Z 2025-05-07T19:44:03.4205314Z cffi-1.17.1 | 295 KB | ########## | 100%  2025-05-07T19:44:03.4206170Z 2025-05-07T19:44:03.4206173Z 2025-05-07T19:44:03.4206177Z 2025-05-07T19:44:03.4206181Z 2025-05-07T19:44:03.4206184Z 2025-05-07T19:44:03.4206188Z 2025-05-07T19:44:03.4206191Z 2025-05-07T19:44:03.4206194Z 2025-05-07T19:44:03.4234538Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:03.4235866Z 2025-05-07T19:44:03.4235918Z 2025-05-07T19:44:03.4237359Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.4238752Z 2025-05-07T19:44:03.4238775Z 2025-05-07T19:44:03.4341874Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.4343266Z 2025-05-07T19:44:03.4343291Z 2025-05-07T19:44:03.4343339Z 2025-05-07T19:44:03.4343361Z 2025-05-07T19:44:03.4343382Z 2025-05-07T19:44:03.4343405Z 2025-05-07T19:44:03.4343420Z 2025-05-07T19:44:03.4378484Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:03.4379407Z 2025-05-07T19:44:03.4445129Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.4446509Z 2025-05-07T19:44:03.4446555Z 2025-05-07T19:44:03.4446567Z 2025-05-07T19:44:03.4446578Z 2025-05-07T19:44:03.4446589Z 2025-05-07T19:44:03.4446600Z 2025-05-07T19:44:03.4446610Z 2025-05-07T19:44:03.4446621Z 2025-05-07T19:44:03.4446632Z 2025-05-07T19:44:03.4454883Z libgcc-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:03.4456506Z 2025-05-07T19:44:03.4456527Z 2025-05-07T19:44:03.4456549Z 2025-05-07T19:44:03.4456571Z 2025-05-07T19:44:03.4456586Z 2025-05-07T19:44:03.4456609Z 2025-05-07T19:44:03.4456626Z 2025-05-07T19:44:03.4456647Z 2025-05-07T19:44:03.4456669Z 2025-05-07T19:44:03.4497799Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:03.4498307Z 2025-05-07T19:44:03.4498311Z 2025-05-07T19:44:03.4498315Z 2025-05-07T19:44:03.4498319Z 2025-05-07T19:44:03.4498552Z 2025-05-07T19:44:03.4499914Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.4500207Z 2025-05-07T19:44:03.4500211Z 2025-05-07T19:44:03.4500214Z 2025-05-07T19:44:03.4500219Z 2025-05-07T19:44:03.4500228Z 2025-05-07T19:44:03.4641820Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.4643683Z 2025-05-07T19:44:03.4643700Z 2025-05-07T19:44:03.4643715Z 2025-05-07T19:44:03.4643729Z 2025-05-07T19:44:03.4643745Z 2025-05-07T19:44:03.4643761Z 2025-05-07T19:44:03.4643777Z 2025-05-07T19:44:03.4643793Z 2025-05-07T19:44:03.4997093Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:03.4997482Z 2025-05-07T19:44:03.4997487Z 2025-05-07T19:44:03.4997491Z 2025-05-07T19:44:03.4997494Z 2025-05-07T19:44:03.4997497Z 2025-05-07T19:44:03.4997501Z 2025-05-07T19:44:03.4997504Z 2025-05-07T19:44:03.4997507Z 2025-05-07T19:44:03.4997510Z 2025-05-07T19:44:03.4997514Z 2025-05-07T19:44:03.5007843Z python_abi-3.11 | 5 KB | ########## | 100%  2025-05-07T19:44:03.5009159Z 2025-05-07T19:44:03.5009200Z 2025-05-07T19:44:03.5009211Z 2025-05-07T19:44:03.5009221Z 2025-05-07T19:44:03.5009231Z 2025-05-07T19:44:03.5009241Z 2025-05-07T19:44:03.5009251Z 2025-05-07T19:44:03.5009262Z 2025-05-07T19:44:03.5009272Z 2025-05-07T19:44:03.5009282Z 2025-05-07T19:44:03.5050229Z python_abi-3.11 | 5 KB | ########## | 100%  2025-05-07T19:44:03.5051615Z 2025-05-07T19:44:03.5051631Z 2025-05-07T19:44:03.5051678Z 2025-05-07T19:44:03.5051689Z 2025-05-07T19:44:03.5051700Z 2025-05-07T19:44:03.5051712Z 2025-05-07T19:44:03.5052483Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.5053319Z 2025-05-07T19:44:03.5053330Z 2025-05-07T19:44:03.5053340Z 2025-05-07T19:44:03.5053350Z 2025-05-07T19:44:03.5053360Z 2025-05-07T19:44:03.5053370Z 2025-05-07T19:44:03.5386132Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.5386639Z 2025-05-07T19:44:03.5386644Z 2025-05-07T19:44:03.5386647Z 2025-05-07T19:44:03.5386651Z 2025-05-07T19:44:03.5386654Z 2025-05-07T19:44:03.5386658Z 2025-05-07T19:44:03.5386661Z 2025-05-07T19:44:03.5386664Z 2025-05-07T19:44:03.5386668Z 2025-05-07T19:44:03.5475321Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:03.5477018Z 2025-05-07T19:44:03.5477085Z 2025-05-07T19:44:03.5477100Z 2025-05-07T19:44:03.5477115Z 2025-05-07T19:44:03.5477131Z 2025-05-07T19:44:03.5477147Z 2025-05-07T19:44:03.5477163Z 2025-05-07T19:44:03.5477179Z 2025-05-07T19:44:03.5477196Z 2025-05-07T19:44:03.5477213Z 2025-05-07T19:44:03.5907433Z python_abi-3.11 | 5 KB | ########## | 100%  2025-05-07T19:44:03.6149668Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:03.6151100Z 2025-05-07T19:44:03.6152598Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.6153858Z 2025-05-07T19:44:03.6155265Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.6156640Z 2025-05-07T19:44:03.6157279Z 2025-05-07T19:44:03.6157764Z  2025-05-07T19:44:03.6158382Z 2025-05-07T19:44:03.6158395Z 2025-05-07T19:44:03.6158889Z  2025-05-07T19:44:03.6159122Z 2025-05-07T19:44:03.6159125Z 2025-05-07T19:44:03.6159129Z 2025-05-07T19:44:03.6159313Z  2025-05-07T19:44:03.6159563Z 2025-05-07T19:44:03.6159567Z 2025-05-07T19:44:03.6159571Z 2025-05-07T19:44:03.6159575Z 2025-05-07T19:44:03.6159759Z  2025-05-07T19:44:03.6159993Z 2025-05-07T19:44:03.6159997Z 2025-05-07T19:44:03.6160000Z 2025-05-07T19:44:03.6160004Z 2025-05-07T19:44:03.6160106Z 2025-05-07T19:44:03.6160331Z  2025-05-07T19:44:03.6160566Z 2025-05-07T19:44:03.6160569Z 2025-05-07T19:44:03.6160573Z 2025-05-07T19:44:03.6160576Z 2025-05-07T19:44:03.6160580Z 2025-05-07T19:44:03.6160583Z 2025-05-07T19:44:03.6160795Z  2025-05-07T19:44:03.6161031Z 2025-05-07T19:44:03.6161034Z 2025-05-07T19:44:03.6161038Z 2025-05-07T19:44:03.6161041Z 2025-05-07T19:44:03.6161045Z 2025-05-07T19:44:03.6161049Z 2025-05-07T19:44:03.6161052Z 2025-05-07T19:44:03.6161246Z  2025-05-07T19:44:03.6161511Z 2025-05-07T19:44:03.6161514Z 2025-05-07T19:44:03.6161518Z 2025-05-07T19:44:03.6161522Z 2025-05-07T19:44:03.6161525Z 2025-05-07T19:44:03.6161528Z 2025-05-07T19:44:03.6161532Z 2025-05-07T19:44:03.6161536Z 2025-05-07T19:44:03.6161740Z  2025-05-07T19:44:03.6162015Z 2025-05-07T19:44:03.6162019Z 2025-05-07T19:44:03.6162022Z 2025-05-07T19:44:03.6162025Z 2025-05-07T19:44:03.6162029Z 2025-05-07T19:44:03.6162033Z 2025-05-07T19:44:03.6162036Z 2025-05-07T19:44:03.6162039Z 2025-05-07T19:44:03.6162043Z 2025-05-07T19:44:03.6162245Z  2025-05-07T19:44:03.6162632Z 2025-05-07T19:44:03.6162638Z 2025-05-07T19:44:03.6162642Z 2025-05-07T19:44:03.6162645Z 2025-05-07T19:44:03.6162649Z 2025-05-07T19:44:03.6162652Z 2025-05-07T19:44:03.6162656Z 2025-05-07T19:44:03.6162659Z 2025-05-07T19:44:03.6162662Z 2025-05-07T19:44:03.6162666Z 2025-05-07T19:44:03.6162909Z  done 2025-05-07T19:44:03.7164920Z Preparing transaction: - done 2025-05-07T19:44:03.8170949Z Verifying transaction: | done 2025-05-07T19:44:05.2206320Z Executing transaction: - \ | / - \ | / - \ | / - \ done 2025-05-07T19:44:05.3186253Z [SETUP] Testing pyOpenSSL import ... 2025-05-07T19:44:07.0077048Z [CHECK] Python (sub-)package 'OpenSSL' found ... 2025-05-07T19:44:07.0084524Z [SETUP] Installing libxcrypt ... 2025-05-07T19:44:07.0109794Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y libxcrypt 2025-05-07T19:44:07.6885992Z Channels: 2025-05-07T19:44:07.6886896Z - conda-forge 2025-05-07T19:44:07.6887547Z Platform: linux-64 2025-05-07T19:44:10.7927770Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:11.2203419Z Solving environment: \ done 2025-05-07T19:44:11.2688364Z 2025-05-07T19:44:11.2688844Z ## Package Plan ## 2025-05-07T19:44:11.2689058Z 2025-05-07T19:44:11.2689375Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:11.2689729Z 2025-05-07T19:44:11.2689877Z added / updated specs: 2025-05-07T19:44:11.2690150Z - libxcrypt 2025-05-07T19:44:11.2690348Z 2025-05-07T19:44:11.2690766Z 2025-05-07T19:44:11.2690911Z The following packages will be downloaded: 2025-05-07T19:44:11.2691149Z 2025-05-07T19:44:11.2691276Z package | build 2025-05-07T19:44:11.2691661Z ---------------------------|----------------- 2025-05-07T19:44:11.2692090Z libxcrypt-4.4.36 | hd590300_1 98 KB conda-forge 2025-05-07T19:44:11.2692518Z ------------------------------------------------------------ 2025-05-07T19:44:11.2693011Z Total: 98 KB 2025-05-07T19:44:11.2693223Z 2025-05-07T19:44:11.2693353Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:11.2693604Z 2025-05-07T19:44:11.2693848Z libxcrypt conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 2025-05-07T19:44:11.2694135Z 2025-05-07T19:44:11.2694138Z 2025-05-07T19:44:11.2694142Z 2025-05-07T19:44:11.2694313Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:11.4188800Z libxcrypt-4.4.36 | 98 KB | | 0% 2025-05-07T19:44:11.4208729Z libxcrypt-4.4.36 | 98 KB | #6 | 16% 2025-05-07T19:44:11.4308287Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:11.4309538Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:11.4310653Z 2025-05-07T19:44:11.4310960Z done 2025-05-07T19:44:11.5318429Z Preparing transaction: / done 2025-05-07T19:44:11.6325264Z Verifying transaction: \ done 2025-05-07T19:44:11.7336991Z Executing transaction: / done 2025-05-07T19:44:15.0230951Z [SETUP] Copying over ... 2025-05-07T19:44:15.0231830Z + cp /github/home/miniconda/envs/build_binary/include/crypt.h /github/home/miniconda/envs/build_binary/include/python3.11/crypt.h 2025-05-07T19:44:15.0232492Z 2025-05-07T19:44:15.0271920Z 2025-05-07T19:44:16.6125493Z [SETUP] Installed Python version: Python 3.11.11 2025-05-07T19:44:16.6126168Z [SETUP] Successfully created Conda environment: build_binary 2025-05-07T19:44:16.6207497Z ##[group]Run . $PRELUDE; install_cxx_compiler $BUILD_ENV clang 2025-05-07T19:44:16.6208051Z . $PRELUDE; install_cxx_compiler $BUILD_ENV clang 2025-05-07T19:44:16.6208732Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:16.6209217Z env: 2025-05-07T19:44:16.6209450Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:16.6209793Z BUILD_ENV: build_binary 2025-05-07T19:44:16.6210048Z BUILD_TARGET: genai 2025-05-07T19:44:16.6210315Z BUILD_VARIANT: cuda 2025-05-07T19:44:16.6210563Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:16.6210856Z ##[endgroup] 2025-05-07T19:44:17.0493098Z ################################################################################ 2025-05-07T19:44:17.0493480Z # Install C/C++ Compilers 2025-05-07T19:44:17.0510002Z # 2025-05-07T19:44:17.0510357Z # [2025-05-07T19:44:17.050Z] + install_cxx_compiler build_binary clang 2025-05-07T19:44:17.0510893Z ################################################################################ 2025-05-07T19:44:17.0511151Z 2025-05-07T19:44:17.0524583Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:17.1359634Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:17.1364401Z [INSTALL] Installing GLIBC (architecture = 64) ... 2025-05-07T19:44:17.1388245Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y sysroot_linux-64=2.17 2025-05-07T19:44:17.8076041Z Channels: 2025-05-07T19:44:17.8076381Z - conda-forge 2025-05-07T19:44:17.8076648Z Platform: linux-64 2025-05-07T19:44:20.8643180Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:21.2913611Z Solving environment: \ done 2025-05-07T19:44:21.3405170Z 2025-05-07T19:44:21.3405488Z ## Package Plan ## 2025-05-07T19:44:21.3405688Z 2025-05-07T19:44:21.3405948Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:21.3406280Z 2025-05-07T19:44:21.3406399Z added / updated specs: 2025-05-07T19:44:21.3406772Z - sysroot_linux-64=2.17 2025-05-07T19:44:21.3406956Z 2025-05-07T19:44:21.3406961Z 2025-05-07T19:44:21.3407126Z The following packages will be downloaded: 2025-05-07T19:44:21.3407356Z 2025-05-07T19:44:21.3407490Z package | build 2025-05-07T19:44:21.3407877Z ---------------------------|----------------- 2025-05-07T19:44:21.3408340Z kernel-headers_linux-64-3.10.0| he073ed8_18 921 KB conda-forge 2025-05-07T19:44:21.3408918Z sysroot_linux-64-2.17 | h0157908_18 14.5 MB conda-forge 2025-05-07T19:44:21.3409372Z ------------------------------------------------------------ 2025-05-07T19:44:21.3409803Z Total: 15.4 MB 2025-05-07T19:44:21.3410032Z 2025-05-07T19:44:21.3410212Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:21.3410459Z 2025-05-07T19:44:21.3410795Z kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-3.10.0-he073ed8_18 2025-05-07T19:44:21.3411785Z sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18 2025-05-07T19:44:21.3412130Z 2025-05-07T19:44:21.3412134Z 2025-05-07T19:44:21.3412138Z 2025-05-07T19:44:21.3412331Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:21.3412745Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:21.3412998Z 2025-05-07T19:44:21.5592438Z kernel-headers_linux | 921 KB | | 0%  2025-05-07T19:44:21.6201106Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:21.6201442Z 2025-05-07T19:44:21.6374782Z kernel-headers_linux | 921 KB | 1 | 2%  2025-05-07T19:44:21.6375673Z 2025-05-07T19:44:21.6675210Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:21.7675355Z sysroot_linux-64-2.1 | 14.5 MB | ####7 | 48% 2025-05-07T19:44:21.8032792Z sysroot_linux-64-2.1 | 14.5 MB | #########7 | 97% 2025-05-07T19:44:21.8033598Z 2025-05-07T19:44:21.8035003Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:21.8035827Z 2025-05-07T19:44:21.8179783Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:22.2596633Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:22.2597813Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:22.2598829Z 2025-05-07T19:44:22.2599443Z 2025-05-07T19:44:22.2600085Z  done 2025-05-07T19:44:22.3606244Z Preparing transaction: / done 2025-05-07T19:44:22.5615849Z Verifying transaction: \ | done 2025-05-07T19:44:22.6624115Z Executing transaction: - done 2025-05-07T19:44:22.7477848Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:44:22.7478327Z [CHECK] CONDA_PREFIX is not set. 2025-05-07T19:44:24.3897332Z [CHECK] libstdc++.so.6 found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libstdc++.so.6 2025-05-07T19:44:24.3917314Z [INSTALL] Installing GCC (11.4.0, 64) through Conda ... 2025-05-07T19:44:24.3945841Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y gxx_linux-64=11.4.0 2025-05-07T19:44:25.0913263Z Channels: 2025-05-07T19:44:25.0913678Z - conda-forge 2025-05-07T19:44:25.0913981Z Platform: linux-64 2025-05-07T19:44:28.2256333Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:29.3717252Z Solving environment: \ | / done 2025-05-07T19:44:29.4242663Z 2025-05-07T19:44:29.4243141Z ## Package Plan ## 2025-05-07T19:44:29.4243330Z 2025-05-07T19:44:29.4243681Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:29.4244046Z 2025-05-07T19:44:29.4244168Z added / updated specs: 2025-05-07T19:44:29.4244480Z - gxx_linux-64=11.4.0 2025-05-07T19:44:29.4244656Z 2025-05-07T19:44:29.4244660Z 2025-05-07T19:44:29.4244821Z The following packages will be downloaded: 2025-05-07T19:44:29.4245061Z 2025-05-07T19:44:29.4245247Z package | build 2025-05-07T19:44:29.4245636Z ---------------------------|----------------- 2025-05-07T19:44:29.4246077Z binutils_impl_linux-64-2.40| ha1999f0_7 6.0 MB conda-forge 2025-05-07T19:44:29.4246621Z binutils_linux-64-2.40 | hb3c18ed_4 28 KB conda-forge 2025-05-07T19:44:29.4247138Z gcc_impl_linux-64-11.4.0 | h00c12a0_13 53.0 MB conda-forge 2025-05-07T19:44:29.4247662Z gcc_linux-64-11.4.0 | ha077dfb_4 31 KB conda-forge 2025-05-07T19:44:29.4248164Z gxx_impl_linux-64-11.4.0 | h634f3ee_13 11.2 MB conda-forge 2025-05-07T19:44:29.4248642Z gxx_linux-64-11.4.0 | h35bfe5d_4 29 KB conda-forge 2025-05-07T19:44:29.4249132Z ld_impl_linux-64-2.40 | hf3520f5_7 691 KB conda-forge 2025-05-07T19:44:29.4249649Z libgcc-devel_linux-64-11.4.0| h8f596e0_113 2.3 MB conda-forge 2025-05-07T19:44:29.4250808Z libsanitizer-11.4.0 | h5763a12_13 3.5 MB conda-forge 2025-05-07T19:44:29.4251323Z libstdcxx-15.1.0 | h8f9b012_2 3.7 MB conda-forge 2025-05-07T19:44:29.4251840Z libstdcxx-devel_linux-64-11.4.0| h8f596e0_113 11.1 MB conda-forge 2025-05-07T19:44:29.4252392Z libstdcxx-ng-15.1.0 | h4852527_2 34 KB conda-forge 2025-05-07T19:44:29.4252834Z ------------------------------------------------------------ 2025-05-07T19:44:29.4253239Z Total: 91.6 MB 2025-05-07T19:44:29.4253468Z 2025-05-07T19:44:29.4253611Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:29.4253878Z 2025-05-07T19:44:29.4254197Z binutils_impl_lin~ conda-forge/linux-64::binutils_impl_linux-64-2.40-ha1999f0_7 2025-05-07T19:44:29.4254840Z binutils_linux-64 conda-forge/linux-64::binutils_linux-64-2.40-hb3c18ed_4 2025-05-07T19:44:29.4255592Z gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-11.4.0-h00c12a0_13 2025-05-07T19:44:29.4256335Z gcc_linux-64 conda-forge/linux-64::gcc_linux-64-11.4.0-ha077dfb_4 2025-05-07T19:44:29.4256903Z gxx_impl_linux-64 conda-forge/linux-64::gxx_impl_linux-64-11.4.0-h634f3ee_13 2025-05-07T19:44:29.4257543Z gxx_linux-64 conda-forge/linux-64::gxx_linux-64-11.4.0-h35bfe5d_4 2025-05-07T19:44:29.4258105Z libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:29.4259043Z libsanitizer conda-forge/linux-64::libsanitizer-11.4.0-h5763a12_13 2025-05-07T19:44:29.4259634Z libstdcxx conda-forge/linux-64::libstdcxx-15.1.0-h8f9b012_2 2025-05-07T19:44:29.4260251Z libstdcxx-devel_l~ conda-forge/noarch::libstdcxx-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:29.4260650Z 2025-05-07T19:44:29.4260808Z The following packages will be UPDATED: 2025-05-07T19:44:29.4261034Z 2025-05-07T19:44:29.4261382Z ld_impl_linux-64 pkgs/main::ld_impl_linux-64-2.40-h12e~ --> conda-forge::ld_impl_linux-64-2.40-hf3520f5_7 2025-05-07T19:44:29.4262202Z libstdcxx-ng pkgs/main::libstdcxx-ng-11.2.0-h12345~ --> conda-forge::libstdcxx-ng-15.1.0-h4852527_2 2025-05-07T19:44:29.4262653Z 2025-05-07T19:44:29.4262657Z 2025-05-07T19:44:29.4262661Z 2025-05-07T19:44:29.4262823Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:29.4263261Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:29.4263515Z 2025-05-07T19:44:29.4263868Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:29.4264132Z 2025-05-07T19:44:29.4264136Z 2025-05-07T19:44:29.4264391Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:29.4264704Z 2025-05-07T19:44:29.4264708Z 2025-05-07T19:44:29.4264711Z 2025-05-07T19:44:29.4272891Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:29.4273733Z 2025-05-07T19:44:29.4273744Z 2025-05-07T19:44:29.4273790Z 2025-05-07T19:44:29.4273800Z 2025-05-07T19:44:29.4288320Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:29.4289169Z 2025-05-07T19:44:29.4289180Z 2025-05-07T19:44:29.4289190Z 2025-05-07T19:44:29.4289200Z 2025-05-07T19:44:29.4289211Z 2025-05-07T19:44:29.4289861Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:29.4290170Z 2025-05-07T19:44:29.4290173Z 2025-05-07T19:44:29.4290177Z 2025-05-07T19:44:29.4290180Z 2025-05-07T19:44:29.4290184Z 2025-05-07T19:44:29.4290188Z 2025-05-07T19:44:29.4292549Z libgcc-devel_linux-6 | 2.3 MB | | 0%  2025-05-07T19:44:29.4292868Z 2025-05-07T19:44:29.4292885Z 2025-05-07T19:44:29.4292889Z 2025-05-07T19:44:29.4292893Z 2025-05-07T19:44:29.4292897Z 2025-05-07T19:44:29.4292900Z 2025-05-07T19:44:29.4292903Z 2025-05-07T19:44:29.4298875Z ld_impl_linux-64-2.4 | 691 KB | | 0%  2025-05-07T19:44:29.4299735Z 2025-05-07T19:44:29.4299771Z 2025-05-07T19:44:29.4299781Z 2025-05-07T19:44:29.4300093Z 2025-05-07T19:44:29.4300121Z 2025-05-07T19:44:29.4300132Z 2025-05-07T19:44:29.4300143Z 2025-05-07T19:44:29.4300153Z 2025-05-07T19:44:29.4300959Z libstdcxx-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:29.4301827Z 2025-05-07T19:44:29.4301838Z 2025-05-07T19:44:29.4301848Z 2025-05-07T19:44:29.4301858Z 2025-05-07T19:44:29.4301868Z 2025-05-07T19:44:29.4301879Z 2025-05-07T19:44:29.4301889Z 2025-05-07T19:44:29.4301899Z 2025-05-07T19:44:29.4301909Z 2025-05-07T19:44:29.4302381Z gcc_linux-64-11.4.0 | 31 KB | | 0%  2025-05-07T19:44:29.4302682Z 2025-05-07T19:44:29.4302686Z 2025-05-07T19:44:29.4302690Z 2025-05-07T19:44:29.4302693Z 2025-05-07T19:44:29.4302696Z 2025-05-07T19:44:29.4302700Z 2025-05-07T19:44:29.4302704Z 2025-05-07T19:44:29.4302707Z 2025-05-07T19:44:29.4302710Z 2025-05-07T19:44:29.4302714Z 2025-05-07T19:44:29.4303012Z gxx_linux-64-11.4.0 | 29 KB | | 0%  2025-05-07T19:44:29.4303327Z 2025-05-07T19:44:29.4303441Z 2025-05-07T19:44:29.4303446Z 2025-05-07T19:44:29.4303449Z 2025-05-07T19:44:29.4303453Z 2025-05-07T19:44:29.4303456Z 2025-05-07T19:44:29.4303460Z 2025-05-07T19:44:29.4303463Z 2025-05-07T19:44:29.4303466Z 2025-05-07T19:44:29.4303470Z 2025-05-07T19:44:29.4303474Z 2025-05-07T19:44:29.8405621Z binutils_linux-64-2. | 28 KB | | 0%  2025-05-07T19:44:29.8406647Z 2025-05-07T19:44:29.8406661Z 2025-05-07T19:44:29.8406672Z 2025-05-07T19:44:29.8406683Z 2025-05-07T19:44:29.8445577Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:29.8446493Z 2025-05-07T19:44:29.8446508Z 2025-05-07T19:44:29.8557163Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:29.8821894Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:29.8822434Z 2025-05-07T19:44:29.8822453Z 2025-05-07T19:44:29.8822471Z 2025-05-07T19:44:29.8974208Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:29.8975154Z 2025-05-07T19:44:29.9339867Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:29.9340165Z 2025-05-07T19:44:29.9340292Z 2025-05-07T19:44:29.9340296Z 2025-05-07T19:44:29.9340299Z 2025-05-07T19:44:29.9448561Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:29.9448899Z 2025-05-07T19:44:29.9448916Z 2025-05-07T19:44:29.9556474Z libstdcxx-devel_linu | 11.1 MB | ######## | 80%  2025-05-07T19:44:29.9679365Z gcc_impl_linux-64-11 | 53.0 MB | #3 | 14% 2025-05-07T19:44:29.9679695Z 2025-05-07T19:44:29.9679940Z 2025-05-07T19:44:29.9679947Z 2025-05-07T19:44:29.9680047Z 2025-05-07T19:44:29.9680057Z 2025-05-07T19:44:29.9975722Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:29.9976055Z 2025-05-07T19:44:30.0092625Z gxx_impl_linux-64-11 | 11.2 MB | ###### | 61%  2025-05-07T19:44:30.0092921Z 2025-05-07T19:44:30.0092967Z 2025-05-07T19:44:30.0094510Z 2025-05-07T19:44:30.0094766Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:30.0095055Z 2025-05-07T19:44:30.0095063Z 2025-05-07T19:44:30.0095067Z 2025-05-07T19:44:30.0452627Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:30.0452928Z 2025-05-07T19:44:30.0452932Z 2025-05-07T19:44:30.0452937Z 2025-05-07T19:44:30.0452940Z 2025-05-07T19:44:30.0452944Z 2025-05-07T19:44:30.0453082Z 2025-05-07T19:44:30.0556755Z libgcc-devel_linux-6 | 2.3 MB | | 1%  2025-05-07T19:44:30.0884642Z gcc_impl_linux-64-11 | 53.0 MB | ##4 | 25% 2025-05-07T19:44:30.0884912Z 2025-05-07T19:44:30.0884918Z 2025-05-07T19:44:30.0884923Z 2025-05-07T19:44:30.0884928Z 2025-05-07T19:44:30.0884932Z 2025-05-07T19:44:30.0885288Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:30.0885597Z 2025-05-07T19:44:30.0885602Z 2025-05-07T19:44:30.0885606Z 2025-05-07T19:44:30.0885862Z 2025-05-07T19:44:30.0885885Z 2025-05-07T19:44:30.1089953Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:30.1090282Z 2025-05-07T19:44:30.1090302Z 2025-05-07T19:44:30.1090306Z 2025-05-07T19:44:30.1090309Z 2025-05-07T19:44:30.1090313Z 2025-05-07T19:44:30.1090316Z 2025-05-07T19:44:30.1130372Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:30.1130700Z 2025-05-07T19:44:30.1130705Z 2025-05-07T19:44:30.1348542Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:30.1348858Z 2025-05-07T19:44:30.1348863Z 2025-05-07T19:44:30.1348868Z 2025-05-07T19:44:30.1348873Z 2025-05-07T19:44:30.1348878Z 2025-05-07T19:44:30.1348882Z 2025-05-07T19:44:30.1348887Z 2025-05-07T19:44:30.1349165Z ld_impl_linux-64-2.4 | 691 KB | 2 | 2%  2025-05-07T19:44:30.1349460Z 2025-05-07T19:44:30.1349464Z 2025-05-07T19:44:30.1349467Z 2025-05-07T19:44:30.1349475Z 2025-05-07T19:44:30.1349953Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:30.1350249Z 2025-05-07T19:44:30.1350253Z 2025-05-07T19:44:30.1350257Z 2025-05-07T19:44:30.1350260Z 2025-05-07T19:44:30.1401747Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:30.1402066Z 2025-05-07T19:44:30.1402154Z 2025-05-07T19:44:30.1402158Z 2025-05-07T19:44:30.1402162Z 2025-05-07T19:44:30.1402165Z 2025-05-07T19:44:30.1402169Z 2025-05-07T19:44:30.1402172Z 2025-05-07T19:44:30.1402176Z 2025-05-07T19:44:30.1421771Z libstdcxx-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:30.1422118Z 2025-05-07T19:44:30.1422123Z 2025-05-07T19:44:30.1422127Z 2025-05-07T19:44:30.1422130Z 2025-05-07T19:44:30.1422134Z 2025-05-07T19:44:30.1422137Z 2025-05-07T19:44:30.1422140Z 2025-05-07T19:44:30.1422144Z 2025-05-07T19:44:30.1448550Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:30.1448917Z 2025-05-07T19:44:30.1448930Z 2025-05-07T19:44:30.1448934Z 2025-05-07T19:44:30.1448937Z 2025-05-07T19:44:30.1448941Z 2025-05-07T19:44:30.1448945Z 2025-05-07T19:44:30.1448948Z 2025-05-07T19:44:30.1448952Z 2025-05-07T19:44:30.1449547Z 2025-05-07T19:44:30.1468406Z gcc_linux-64-11.4.0 | 31 KB | #####2 | 52%  2025-05-07T19:44:30.1468734Z 2025-05-07T19:44:30.1468738Z 2025-05-07T19:44:30.1468741Z 2025-05-07T19:44:30.1468745Z 2025-05-07T19:44:30.1468748Z 2025-05-07T19:44:30.1468751Z 2025-05-07T19:44:30.1468755Z 2025-05-07T19:44:30.1468758Z 2025-05-07T19:44:30.1468762Z 2025-05-07T19:44:30.1557325Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:30.1575106Z gcc_impl_linux-64-11 | 53.0 MB | ###9 | 39% 2025-05-07T19:44:30.1575387Z 2025-05-07T19:44:30.1575392Z 2025-05-07T19:44:30.1575397Z 2025-05-07T19:44:30.1575402Z 2025-05-07T19:44:30.1575423Z 2025-05-07T19:44:30.1575427Z 2025-05-07T19:44:30.1575695Z 2025-05-07T19:44:30.1769434Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:30.1769776Z 2025-05-07T19:44:30.1769794Z 2025-05-07T19:44:30.1769798Z 2025-05-07T19:44:30.1769802Z 2025-05-07T19:44:30.1769805Z 2025-05-07T19:44:30.1769809Z 2025-05-07T19:44:30.1769812Z 2025-05-07T19:44:30.1769816Z 2025-05-07T19:44:30.1769819Z 2025-05-07T19:44:30.1769823Z 2025-05-07T19:44:30.1786853Z gxx_linux-64-11.4.0 | 29 KB | #####5 | 55%  2025-05-07T19:44:30.1787255Z 2025-05-07T19:44:30.1787578Z 2025-05-07T19:44:30.1787635Z 2025-05-07T19:44:30.1787639Z 2025-05-07T19:44:30.1787654Z 2025-05-07T19:44:30.1787657Z 2025-05-07T19:44:30.1787670Z 2025-05-07T19:44:30.1787681Z 2025-05-07T19:44:30.1787692Z 2025-05-07T19:44:30.1787703Z 2025-05-07T19:44:30.1895707Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:30.1896041Z 2025-05-07T19:44:30.1896046Z 2025-05-07T19:44:30.1896050Z 2025-05-07T19:44:30.1896261Z 2025-05-07T19:44:30.1896272Z 2025-05-07T19:44:30.1896276Z 2025-05-07T19:44:30.1896279Z 2025-05-07T19:44:30.1896282Z 2025-05-07T19:44:30.1896286Z 2025-05-07T19:44:30.1896289Z 2025-05-07T19:44:30.1896293Z 2025-05-07T19:44:30.1912792Z binutils_linux-64-2. | 28 KB | #####6 | 56%  2025-05-07T19:44:30.1913137Z 2025-05-07T19:44:30.1913142Z 2025-05-07T19:44:30.1913147Z 2025-05-07T19:44:30.1913153Z 2025-05-07T19:44:30.1913158Z 2025-05-07T19:44:30.1913163Z 2025-05-07T19:44:30.1913168Z 2025-05-07T19:44:30.1913173Z 2025-05-07T19:44:30.1913178Z 2025-05-07T19:44:30.1913184Z 2025-05-07T19:44:30.1913189Z 2025-05-07T19:44:30.2164093Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:30.2164432Z 2025-05-07T19:44:30.2164666Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:30.2164935Z 2025-05-07T19:44:30.2535083Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:30.2535664Z 2025-05-07T19:44:30.2535688Z 2025-05-07T19:44:30.2535692Z 2025-05-07T19:44:30.2535695Z 2025-05-07T19:44:30.2535699Z 2025-05-07T19:44:30.2535702Z 2025-05-07T19:44:30.2538475Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:30.2538787Z 2025-05-07T19:44:30.2538791Z 2025-05-07T19:44:30.2538795Z 2025-05-07T19:44:30.2538798Z 2025-05-07T19:44:30.2538815Z 2025-05-07T19:44:30.2538822Z 2025-05-07T19:44:30.2558619Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:30.2962239Z gcc_impl_linux-64-11 | 53.0 MB | ###### | 61% 2025-05-07T19:44:30.2962768Z 2025-05-07T19:44:30.2962819Z 2025-05-07T19:44:30.2962829Z 2025-05-07T19:44:30.2962833Z 2025-05-07T19:44:30.2962836Z 2025-05-07T19:44:30.3424419Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:30.3424790Z 2025-05-07T19:44:30.3424796Z 2025-05-07T19:44:30.3424801Z 2025-05-07T19:44:30.3424806Z 2025-05-07T19:44:30.3424849Z 2025-05-07T19:44:30.3424879Z 2025-05-07T19:44:30.3424883Z 2025-05-07T19:44:30.3424887Z 2025-05-07T19:44:30.3425250Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:30.3425564Z 2025-05-07T19:44:30.3425568Z 2025-05-07T19:44:30.3425571Z 2025-05-07T19:44:30.3425575Z 2025-05-07T19:44:30.3425578Z 2025-05-07T19:44:30.3425582Z 2025-05-07T19:44:30.3425585Z 2025-05-07T19:44:30.3425589Z 2025-05-07T19:44:30.3643040Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:30.3831777Z gcc_impl_linux-64-11 | 53.0 MB | #######5 | 76% 2025-05-07T19:44:30.3832236Z 2025-05-07T19:44:30.3832293Z 2025-05-07T19:44:30.3832302Z 2025-05-07T19:44:30.3832307Z 2025-05-07T19:44:30.3832311Z 2025-05-07T19:44:30.3832314Z 2025-05-07T19:44:30.3832318Z 2025-05-07T19:44:30.3832329Z 2025-05-07T19:44:30.3832332Z 2025-05-07T19:44:30.3834225Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:30.3834622Z 2025-05-07T19:44:30.3834628Z 2025-05-07T19:44:30.3834632Z 2025-05-07T19:44:30.3834636Z 2025-05-07T19:44:30.3834639Z 2025-05-07T19:44:30.3834643Z 2025-05-07T19:44:30.3834646Z 2025-05-07T19:44:30.3834650Z 2025-05-07T19:44:30.3834653Z 2025-05-07T19:44:30.4176724Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:30.4177055Z 2025-05-07T19:44:30.4177060Z 2025-05-07T19:44:30.4177063Z 2025-05-07T19:44:30.4177069Z 2025-05-07T19:44:30.4177072Z 2025-05-07T19:44:30.4177076Z 2025-05-07T19:44:30.4177080Z 2025-05-07T19:44:30.4177368Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:30.4177662Z 2025-05-07T19:44:30.4177666Z 2025-05-07T19:44:30.4177669Z 2025-05-07T19:44:30.4177672Z 2025-05-07T19:44:30.4177676Z 2025-05-07T19:44:30.4177679Z 2025-05-07T19:44:30.4177838Z 2025-05-07T19:44:30.4448937Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:30.4449494Z 2025-05-07T19:44:30.4449512Z 2025-05-07T19:44:30.4449516Z 2025-05-07T19:44:30.4449519Z 2025-05-07T19:44:30.4449523Z 2025-05-07T19:44:30.4449526Z 2025-05-07T19:44:30.4449529Z 2025-05-07T19:44:30.4449533Z 2025-05-07T19:44:30.4449536Z 2025-05-07T19:44:30.4449553Z 2025-05-07T19:44:30.4450468Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:30.4450760Z 2025-05-07T19:44:30.4450764Z 2025-05-07T19:44:30.4450767Z 2025-05-07T19:44:30.4450771Z 2025-05-07T19:44:30.4450774Z 2025-05-07T19:44:30.4450778Z 2025-05-07T19:44:30.4450781Z 2025-05-07T19:44:30.4450798Z 2025-05-07T19:44:30.4450801Z 2025-05-07T19:44:30.4450809Z 2025-05-07T19:44:30.4725706Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:30.4726023Z 2025-05-07T19:44:30.4726028Z 2025-05-07T19:44:30.4726032Z 2025-05-07T19:44:30.4726036Z 2025-05-07T19:44:30.4726039Z 2025-05-07T19:44:30.4726057Z 2025-05-07T19:44:30.4726076Z 2025-05-07T19:44:30.4726284Z 2025-05-07T19:44:30.4726289Z 2025-05-07T19:44:30.4726292Z 2025-05-07T19:44:30.4726303Z 2025-05-07T19:44:30.4727437Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:30.4727755Z 2025-05-07T19:44:30.4727772Z 2025-05-07T19:44:30.4727776Z 2025-05-07T19:44:30.4727779Z 2025-05-07T19:44:30.4727783Z 2025-05-07T19:44:30.4727786Z 2025-05-07T19:44:30.4727790Z 2025-05-07T19:44:30.4727793Z 2025-05-07T19:44:30.4727797Z 2025-05-07T19:44:30.4727800Z 2025-05-07T19:44:30.4727807Z 2025-05-07T19:44:30.4873215Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:30.4873570Z 2025-05-07T19:44:30.4873621Z 2025-05-07T19:44:30.4873624Z 2025-05-07T19:44:30.5114472Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:30.6646390Z gcc_impl_linux-64-11 | 53.0 MB | ######### | 90% 2025-05-07T19:44:30.6646677Z 2025-05-07T19:44:30.7685286Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:30.7685624Z 2025-05-07T19:44:30.7685640Z 2025-05-07T19:44:30.8107502Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:31.3514185Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:31.3524806Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:31.3525266Z 2025-05-07T19:44:31.3525483Z 2025-05-07T19:44:31.3525766Z  2025-05-07T19:44:31.3525985Z 2025-05-07T19:44:31.3525989Z 2025-05-07T19:44:31.3526166Z  2025-05-07T19:44:31.3526409Z 2025-05-07T19:44:31.3526414Z 2025-05-07T19:44:31.3526417Z 2025-05-07T19:44:31.3526594Z  2025-05-07T19:44:31.3526817Z 2025-05-07T19:44:31.3526821Z 2025-05-07T19:44:31.3526825Z 2025-05-07T19:44:31.3526849Z 2025-05-07T19:44:31.3527053Z  2025-05-07T19:44:31.3527278Z 2025-05-07T19:44:31.3527281Z 2025-05-07T19:44:31.3527285Z 2025-05-07T19:44:31.3527288Z 2025-05-07T19:44:31.3527292Z 2025-05-07T19:44:31.3527493Z  2025-05-07T19:44:31.3527715Z 2025-05-07T19:44:31.3527719Z 2025-05-07T19:44:31.3527723Z 2025-05-07T19:44:31.3527727Z 2025-05-07T19:44:31.3527730Z 2025-05-07T19:44:31.3527734Z 2025-05-07T19:44:31.3527935Z  2025-05-07T19:44:31.3528182Z 2025-05-07T19:44:31.3528186Z 2025-05-07T19:44:31.3528189Z 2025-05-07T19:44:31.3528193Z 2025-05-07T19:44:31.3528196Z 2025-05-07T19:44:31.3528200Z 2025-05-07T19:44:31.3528203Z 2025-05-07T19:44:31.3528390Z  2025-05-07T19:44:31.3528639Z 2025-05-07T19:44:31.3528643Z 2025-05-07T19:44:31.3528647Z 2025-05-07T19:44:31.3528942Z 2025-05-07T19:44:31.3528955Z 2025-05-07T19:44:31.3528959Z 2025-05-07T19:44:31.3528962Z 2025-05-07T19:44:31.3528966Z 2025-05-07T19:44:31.3529166Z  2025-05-07T19:44:31.3529424Z 2025-05-07T19:44:31.3529427Z 2025-05-07T19:44:31.3529431Z 2025-05-07T19:44:31.3529434Z 2025-05-07T19:44:31.3529438Z 2025-05-07T19:44:31.3529441Z 2025-05-07T19:44:31.3529444Z 2025-05-07T19:44:31.3529448Z 2025-05-07T19:44:31.3529451Z 2025-05-07T19:44:31.3529644Z  2025-05-07T19:44:31.3529877Z 2025-05-07T19:44:31.3529902Z 2025-05-07T19:44:31.3529906Z 2025-05-07T19:44:31.3529909Z 2025-05-07T19:44:31.3529913Z 2025-05-07T19:44:31.3529916Z 2025-05-07T19:44:31.3529920Z 2025-05-07T19:44:31.3529923Z 2025-05-07T19:44:31.3529927Z 2025-05-07T19:44:31.3529930Z 2025-05-07T19:44:31.3530130Z  2025-05-07T19:44:31.3530534Z 2025-05-07T19:44:31.3530539Z 2025-05-07T19:44:31.3530544Z 2025-05-07T19:44:31.3530548Z 2025-05-07T19:44:31.3530551Z 2025-05-07T19:44:31.3530555Z 2025-05-07T19:44:31.3530558Z 2025-05-07T19:44:31.3530562Z 2025-05-07T19:44:31.3530566Z 2025-05-07T19:44:31.3530569Z 2025-05-07T19:44:31.3530573Z 2025-05-07T19:44:31.3530787Z  done 2025-05-07T19:44:31.4533508Z Preparing transaction: \ done 2025-05-07T19:44:32.8568511Z Verifying transaction: / - \ | / - \ | / - \ | / - done 2025-05-07T19:44:32.9585306Z Executing transaction: | done 2025-05-07T19:44:33.0542046Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:36.7059573Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:36.7060940Z 2025-05-07T19:44:36.7069936Z 2025-05-07T19:44:36.7088210Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:36.7089942Z 2025-05-07T19:44:36.7102173Z 2025-05-07T19:44:36.7127431Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:36.7128079Z 2025-05-07T19:44:36.7145200Z 2025-05-07T19:44:36.7165757Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:36.7166395Z 2025-05-07T19:44:36.7177440Z 2025-05-07T19:44:36.7187816Z [INSTALL] Installing Clang (16.0.6, 64) and relevant libraries through Conda ... 2025-05-07T19:44:36.7209783Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y clangxx=16.0.6 libcxx llvm-openmp=16.0.6 compiler-rt=16.0.6 2025-05-07T19:44:37.4187630Z Channels: 2025-05-07T19:44:37.4187998Z - conda-forge 2025-05-07T19:44:37.4188344Z Platform: linux-64 2025-05-07T19:44:40.5020775Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:41.8454588Z Solving environment: \ | / done 2025-05-07T19:44:41.8965913Z 2025-05-07T19:44:41.8966211Z ## Package Plan ## 2025-05-07T19:44:41.8966395Z 2025-05-07T19:44:41.8966679Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:41.8967007Z 2025-05-07T19:44:41.8967331Z added / updated specs: 2025-05-07T19:44:41.8967629Z - clangxx=16.0.6 2025-05-07T19:44:41.8967900Z - compiler-rt=16.0.6 2025-05-07T19:44:41.8968156Z - libcxx 2025-05-07T19:44:41.8968395Z - llvm-openmp=16.0.6 2025-05-07T19:44:41.8968561Z 2025-05-07T19:44:41.8968567Z 2025-05-07T19:44:41.8968700Z The following packages will be downloaded: 2025-05-07T19:44:41.8968945Z 2025-05-07T19:44:41.8969070Z package | build 2025-05-07T19:44:41.8969404Z ---------------------------|----------------- 2025-05-07T19:44:41.8970139Z clang-16.0.6 |default_h9e3a008_14 110 KB conda-forge 2025-05-07T19:44:41.8970636Z clang-16-16.0.6 |default_hb5137d0_14 780 KB conda-forge 2025-05-07T19:44:41.8971107Z clangxx-16.0.6 |default_ha78316a_14 110 KB conda-forge 2025-05-07T19:44:41.8971594Z compiler-rt-16.0.6 | h00ab1b0_2 107 KB conda-forge 2025-05-07T19:44:41.8972081Z compiler-rt_linux-64-16.0.6| h00ab1b0_2 36.0 MB conda-forge 2025-05-07T19:44:41.8972547Z icu-73.2 | h59595ed_0 11.5 MB conda-forge 2025-05-07T19:44:41.8973009Z libclang-cpp16-16.0.6 |default_hb5137d0_14 17.3 MB conda-forge 2025-05-07T19:44:41.8973496Z libcxx-19.1.7 | h2713693_1 1000 KB conda-forge 2025-05-07T19:44:41.8973953Z libcxxabi-19.1.7 | hd85fd95_1 158 KB conda-forge 2025-05-07T19:44:41.8974512Z libiconv-1.18 | h4ce23a2_1 696 KB conda-forge 2025-05-07T19:44:41.8975315Z libllvm16-16.0.6 | hb3ce162_3 33.7 MB conda-forge 2025-05-07T19:44:41.8975887Z libxml2-2.12.7 | hc051c1a_1 688 KB conda-forge 2025-05-07T19:44:41.8976327Z libzlib-1.2.13 | h4ab18f5_6 60 KB conda-forge 2025-05-07T19:44:41.8976763Z llvm-openmp-16.0.6 | h4dfa4b3_0 39.9 MB conda-forge 2025-05-07T19:44:41.8977212Z zlib-1.2.13 | h4ab18f5_6 91 KB conda-forge 2025-05-07T19:44:41.8977623Z zstd-1.5.6 | ha6fb4c9_0 542 KB conda-forge 2025-05-07T19:44:41.8978013Z ------------------------------------------------------------ 2025-05-07T19:44:41.8978382Z Total: 142.6 MB 2025-05-07T19:44:41.8978602Z 2025-05-07T19:44:41.8978734Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:41.8978982Z 2025-05-07T19:44:41.8979218Z clang conda-forge/linux-64::clang-16.0.6-default_h9e3a008_14 2025-05-07T19:44:41.8979734Z clang-16 conda-forge/linux-64::clang-16-16.0.6-default_hb5137d0_14 2025-05-07T19:44:41.8980240Z clangxx conda-forge/linux-64::clangxx-16.0.6-default_ha78316a_14 2025-05-07T19:44:41.8980766Z compiler-rt conda-forge/linux-64::compiler-rt-16.0.6-h00ab1b0_2 2025-05-07T19:44:41.8981431Z compiler-rt_linux~ conda-forge/noarch::compiler-rt_linux-64-16.0.6-h00ab1b0_2 2025-05-07T19:44:41.8981914Z icu conda-forge/linux-64::icu-73.2-h59595ed_0 2025-05-07T19:44:41.8982405Z libclang-cpp16 conda-forge/linux-64::libclang-cpp16-16.0.6-default_hb5137d0_14 2025-05-07T19:44:41.8983100Z libcxx conda-forge/linux-64::libcxx-19.1.7-h2713693_1 2025-05-07T19:44:41.8983573Z libcxxabi conda-forge/linux-64::libcxxabi-19.1.7-hd85fd95_1 2025-05-07T19:44:41.8984035Z libiconv conda-forge/linux-64::libiconv-1.18-h4ce23a2_1 2025-05-07T19:44:41.8984523Z libllvm16 conda-forge/linux-64::libllvm16-16.0.6-hb3ce162_3 2025-05-07T19:44:41.8984998Z libxml2 conda-forge/linux-64::libxml2-2.12.7-hc051c1a_1 2025-05-07T19:44:41.8985445Z libzlib conda-forge/linux-64::libzlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:41.8985935Z llvm-openmp conda-forge/linux-64::llvm-openmp-16.0.6-h4dfa4b3_0 2025-05-07T19:44:41.8988936Z zstd conda-forge/linux-64::zstd-1.5.6-ha6fb4c9_0 2025-05-07T19:44:41.8989221Z 2025-05-07T19:44:41.8989463Z The following packages will be UPDATED: 2025-05-07T19:44:41.8989669Z 2025-05-07T19:44:41.8989924Z zlib pkgs/main::zlib-1.2.13-h5eee18b_1 --> conda-forge::zlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:41.8990611Z 2025-05-07T19:44:41.8990614Z 2025-05-07T19:44:41.8990618Z 2025-05-07T19:44:41.8990771Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:41.8991210Z llvm-openmp-16.0.6 | 39.9 MB | | 0% 2025-05-07T19:44:41.8991582Z 2025-05-07T19:44:41.8991936Z compiler-rt_linux-64 | 36.0 MB | | 0%  2025-05-07T19:44:41.8992198Z 2025-05-07T19:44:41.8992202Z 2025-05-07T19:44:41.8992426Z libllvm16-16.0.6 | 33.7 MB | | 0%  2025-05-07T19:44:41.8992705Z 2025-05-07T19:44:41.8992709Z 2025-05-07T19:44:41.8992713Z 2025-05-07T19:44:41.8992958Z libclang-cpp16-16.0. | 17.3 MB | | 0%  2025-05-07T19:44:41.8993236Z 2025-05-07T19:44:41.8993239Z 2025-05-07T19:44:41.8993243Z 2025-05-07T19:44:41.8993247Z 2025-05-07T19:44:41.9008578Z icu-73.2 | 11.5 MB | | 0%  2025-05-07T19:44:41.9008872Z 2025-05-07T19:44:41.9008877Z 2025-05-07T19:44:41.9008880Z 2025-05-07T19:44:41.9008884Z 2025-05-07T19:44:41.9008887Z 2025-05-07T19:44:41.9009129Z libcxx-19.1.7 | 1000 KB | | 0%  2025-05-07T19:44:41.9009424Z 2025-05-07T19:44:41.9009427Z 2025-05-07T19:44:41.9009431Z 2025-05-07T19:44:41.9009434Z 2025-05-07T19:44:41.9009454Z 2025-05-07T19:44:41.9009652Z 2025-05-07T19:44:41.9012738Z clang-16-16.0.6 | 780 KB | | 0%  2025-05-07T19:44:41.9013047Z 2025-05-07T19:44:41.9013051Z 2025-05-07T19:44:41.9013055Z 2025-05-07T19:44:41.9013058Z 2025-05-07T19:44:41.9013062Z 2025-05-07T19:44:41.9013066Z 2025-05-07T19:44:41.9013070Z 2025-05-07T19:44:41.9014411Z libiconv-1.18 | 696 KB | | 0%  2025-05-07T19:44:41.9014720Z 2025-05-07T19:44:41.9014738Z 2025-05-07T19:44:41.9014742Z 2025-05-07T19:44:41.9014745Z 2025-05-07T19:44:41.9014749Z 2025-05-07T19:44:41.9014753Z 2025-05-07T19:44:41.9014756Z 2025-05-07T19:44:41.9014760Z 2025-05-07T19:44:41.9015676Z libxml2-2.12.7 | 688 KB | | 0%  2025-05-07T19:44:41.9015978Z 2025-05-07T19:44:41.9015995Z 2025-05-07T19:44:41.9015999Z 2025-05-07T19:44:41.9016002Z 2025-05-07T19:44:41.9016006Z 2025-05-07T19:44:41.9016010Z 2025-05-07T19:44:41.9016013Z 2025-05-07T19:44:41.9016024Z 2025-05-07T19:44:41.9016031Z 2025-05-07T19:44:41.9016924Z zstd-1.5.6 | 542 KB | | 0%  2025-05-07T19:44:41.9017217Z 2025-05-07T19:44:41.9017220Z 2025-05-07T19:44:41.9017224Z 2025-05-07T19:44:41.9017227Z 2025-05-07T19:44:41.9017231Z 2025-05-07T19:44:41.9017234Z 2025-05-07T19:44:41.9017238Z 2025-05-07T19:44:41.9017241Z 2025-05-07T19:44:41.9017244Z 2025-05-07T19:44:41.9017258Z 2025-05-07T19:44:41.9018174Z libcxxabi-19.1.7 | 158 KB | | 0%  2025-05-07T19:44:41.9018496Z 2025-05-07T19:44:41.9018513Z 2025-05-07T19:44:41.9018517Z 2025-05-07T19:44:41.9018521Z 2025-05-07T19:44:41.9018524Z 2025-05-07T19:44:41.9018528Z 2025-05-07T19:44:41.9018531Z 2025-05-07T19:44:41.9018535Z 2025-05-07T19:44:41.9018538Z 2025-05-07T19:44:41.9018541Z 2025-05-07T19:44:41.9018545Z 2025-05-07T19:44:41.9019424Z clang-16.0.6 | 110 KB | | 0%  2025-05-07T19:44:41.9019716Z 2025-05-07T19:44:41.9019723Z 2025-05-07T19:44:41.9019727Z 2025-05-07T19:44:41.9019730Z 2025-05-07T19:44:41.9019734Z 2025-05-07T19:44:41.9019737Z 2025-05-07T19:44:41.9019741Z 2025-05-07T19:44:41.9019744Z 2025-05-07T19:44:41.9019747Z 2025-05-07T19:44:41.9019751Z 2025-05-07T19:44:41.9019754Z 2025-05-07T19:44:41.9019770Z 2025-05-07T19:44:41.9020602Z clangxx-16.0.6 | 110 KB | | 0%  2025-05-07T19:44:41.9020902Z 2025-05-07T19:44:41.9020906Z 2025-05-07T19:44:41.9020910Z 2025-05-07T19:44:41.9020930Z 2025-05-07T19:44:41.9020933Z 2025-05-07T19:44:41.9020937Z 2025-05-07T19:44:41.9020940Z 2025-05-07T19:44:41.9020963Z 2025-05-07T19:44:41.9020966Z 2025-05-07T19:44:41.9020969Z 2025-05-07T19:44:41.9020973Z 2025-05-07T19:44:41.9020976Z 2025-05-07T19:44:41.9020980Z 2025-05-07T19:44:41.9021812Z compiler-rt-16.0.6 | 107 KB | | 0%  2025-05-07T19:44:41.9022126Z 2025-05-07T19:44:41.9022251Z 2025-05-07T19:44:41.9022258Z 2025-05-07T19:44:41.9022262Z 2025-05-07T19:44:41.9022265Z 2025-05-07T19:44:41.9022269Z 2025-05-07T19:44:41.9022272Z 2025-05-07T19:44:41.9022275Z 2025-05-07T19:44:41.9022279Z 2025-05-07T19:44:41.9022282Z 2025-05-07T19:44:41.9022285Z 2025-05-07T19:44:41.9022288Z 2025-05-07T19:44:41.9022292Z 2025-05-07T19:44:41.9022295Z 2025-05-07T19:44:41.9022989Z zlib-1.2.13 | 91 KB | | 0%  2025-05-07T19:44:41.9023296Z 2025-05-07T19:44:41.9023319Z 2025-05-07T19:44:41.9023323Z 2025-05-07T19:44:41.9023326Z 2025-05-07T19:44:41.9023330Z 2025-05-07T19:44:41.9023334Z 2025-05-07T19:44:41.9023337Z 2025-05-07T19:44:41.9023341Z 2025-05-07T19:44:41.9023344Z 2025-05-07T19:44:41.9023348Z 2025-05-07T19:44:41.9023351Z 2025-05-07T19:44:41.9023355Z 2025-05-07T19:44:41.9023358Z 2025-05-07T19:44:41.9023362Z 2025-05-07T19:44:41.9023365Z 2025-05-07T19:44:42.1239282Z libzlib-1.2.13 | 60 KB | | 0%  2025-05-07T19:44:42.1239653Z 2025-05-07T19:44:42.1240640Z 2025-05-07T19:44:42.1241325Z libllvm16-16.0.6 | 33.7 MB | | 0%  2025-05-07T19:44:42.1241624Z 2025-05-07T19:44:42.1241628Z 2025-05-07T19:44:42.1241633Z 2025-05-07T19:44:42.1952473Z libclang-cpp16-16.0. | 17.3 MB | | 0%  2025-05-07T19:44:42.1952791Z 2025-05-07T19:44:42.1953018Z compiler-rt_linux-64 | 36.0 MB | | 0%  2025-05-07T19:44:42.2240098Z llvm-openmp-16.0.6 | 39.9 MB | | 0% 2025-05-07T19:44:42.2240483Z 2025-05-07T19:44:42.2240842Z 2025-05-07T19:44:42.2243341Z libllvm16-16.0.6 | 33.7 MB | #8 | 19%  2025-05-07T19:44:42.2243705Z 2025-05-07T19:44:42.2243714Z 2025-05-07T19:44:42.2243717Z 2025-05-07T19:44:42.2380906Z libclang-cpp16-16.0. | 17.3 MB | ######4 | 64%  2025-05-07T19:44:42.2381319Z 2025-05-07T19:44:42.2381503Z 2025-05-07T19:44:42.2381509Z 2025-05-07T19:44:42.2381561Z 2025-05-07T19:44:42.2953466Z icu-73.2 | 11.5 MB | | 0%  2025-05-07T19:44:42.2953750Z 2025-05-07T19:44:42.2955775Z compiler-rt_linux-64 | 36.0 MB | #6 | 16%  2025-05-07T19:44:42.3240224Z llvm-openmp-16.0.6 | 39.9 MB | #4 | 15% 2025-05-07T19:44:42.3240590Z 2025-05-07T19:44:42.3240682Z 2025-05-07T19:44:42.3384964Z libllvm16-16.0.6 | 33.7 MB | ###2 | 33%  2025-05-07T19:44:42.3385402Z 2025-05-07T19:44:42.3385633Z 2025-05-07T19:44:42.3385645Z 2025-05-07T19:44:42.3385652Z 2025-05-07T19:44:42.3955312Z icu-73.2 | 11.5 MB | #####1 | 52%  2025-05-07T19:44:42.3955636Z 2025-05-07T19:44:42.3956297Z compiler-rt_linux-64 | 36.0 MB | ###1 | 31%  2025-05-07T19:44:42.4345881Z llvm-openmp-16.0.6 | 39.9 MB | ##9 | 30% 2025-05-07T19:44:42.4346325Z 2025-05-07T19:44:42.4346445Z 2025-05-07T19:44:42.4346452Z 2025-05-07T19:44:42.4605474Z libclang-cpp16-16.0. | 17.3 MB | ########## | 100%  2025-05-07T19:44:42.4606463Z 2025-05-07T19:44:42.4606477Z 2025-05-07T19:44:42.4606489Z 2025-05-07T19:44:42.4606499Z 2025-05-07T19:44:42.4607123Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:42.4607855Z 2025-05-07T19:44:42.4607866Z 2025-05-07T19:44:42.4607876Z 2025-05-07T19:44:42.4607886Z 2025-05-07T19:44:42.4765713Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:42.4765988Z 2025-05-07T19:44:42.4766001Z 2025-05-07T19:44:42.4810715Z libllvm16-16.0.6 | 33.7 MB | #####5 | 56%  2025-05-07T19:44:42.4811562Z 2025-05-07T19:44:42.4811575Z 2025-05-07T19:44:42.4811586Z 2025-05-07T19:44:42.4811598Z 2025-05-07T19:44:42.4811608Z 2025-05-07T19:44:42.4957252Z libcxx-19.1.7 | 1000 KB | 1 | 2%  2025-05-07T19:44:42.5035857Z llvm-openmp-16.0.6 | 39.9 MB | ####6 | 46% 2025-05-07T19:44:42.5036702Z 2025-05-07T19:44:42.5036728Z 2025-05-07T19:44:42.5037222Z 2025-05-07T19:44:42.5037258Z 2025-05-07T19:44:42.5037269Z 2025-05-07T19:44:42.5278690Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:42.5279016Z 2025-05-07T19:44:42.5279020Z 2025-05-07T19:44:42.5279024Z 2025-05-07T19:44:42.5279027Z 2025-05-07T19:44:42.5279031Z 2025-05-07T19:44:42.5279049Z 2025-05-07T19:44:42.5402777Z clang-16-16.0.6 | 780 KB | 2 | 2%  2025-05-07T19:44:42.5403115Z 2025-05-07T19:44:42.5516645Z compiler-rt_linux-64 | 36.0 MB | ####3 | 44%  2025-05-07T19:44:42.5516952Z 2025-05-07T19:44:42.5517066Z 2025-05-07T19:44:42.5517070Z 2025-05-07T19:44:42.5517166Z 2025-05-07T19:44:42.5517174Z 2025-05-07T19:44:42.5517180Z 2025-05-07T19:44:42.5517185Z 2025-05-07T19:44:42.5740071Z libiconv-1.18 | 696 KB | 2 | 2%  2025-05-07T19:44:42.5740456Z 2025-05-07T19:44:42.5740461Z 2025-05-07T19:44:42.5740466Z 2025-05-07T19:44:42.5740471Z 2025-05-07T19:44:42.5740477Z 2025-05-07T19:44:42.5740520Z 2025-05-07T19:44:42.5785198Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:42.5785554Z 2025-05-07T19:44:42.5785559Z 2025-05-07T19:44:42.5785563Z 2025-05-07T19:44:42.5785567Z 2025-05-07T19:44:42.5785570Z 2025-05-07T19:44:42.5785574Z 2025-05-07T19:44:42.5785577Z 2025-05-07T19:44:42.5917635Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:42.5918552Z 2025-05-07T19:44:42.5918565Z 2025-05-07T19:44:42.5957341Z libllvm16-16.0.6 | 33.7 MB | ######9 | 69%  2025-05-07T19:44:42.6285259Z llvm-openmp-16.0.6 | 39.9 MB | ######2 | 63% 2025-05-07T19:44:42.6285544Z 2025-05-07T19:44:42.6285551Z 2025-05-07T19:44:42.6285559Z 2025-05-07T19:44:42.6285566Z 2025-05-07T19:44:42.6285572Z 2025-05-07T19:44:42.6285590Z 2025-05-07T19:44:42.6285596Z 2025-05-07T19:44:42.6285600Z 2025-05-07T19:44:42.6369250Z libxml2-2.12.7 | 688 KB | 2 | 2%  2025-05-07T19:44:42.6370206Z 2025-05-07T19:44:42.6370256Z 2025-05-07T19:44:42.6370268Z 2025-05-07T19:44:42.6370279Z 2025-05-07T19:44:42.6370290Z 2025-05-07T19:44:42.6370319Z 2025-05-07T19:44:42.6370330Z 2025-05-07T19:44:42.6370340Z 2025-05-07T19:44:42.6370350Z 2025-05-07T19:44:42.6404152Z zstd-1.5.6 | 542 KB | 2 | 3%  2025-05-07T19:44:42.6404451Z 2025-05-07T19:44:42.6573713Z compiler-rt_linux-64 | 36.0 MB | #####7 | 58%  2025-05-07T19:44:42.6574026Z 2025-05-07T19:44:42.6574032Z 2025-05-07T19:44:42.6574038Z 2025-05-07T19:44:42.6574042Z 2025-05-07T19:44:42.6574048Z 2025-05-07T19:44:42.6574056Z 2025-05-07T19:44:42.6574062Z 2025-05-07T19:44:42.6574067Z 2025-05-07T19:44:42.6575827Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:42.6576152Z 2025-05-07T19:44:42.6576157Z 2025-05-07T19:44:42.6576160Z 2025-05-07T19:44:42.6576164Z 2025-05-07T19:44:42.6576168Z 2025-05-07T19:44:42.6576172Z 2025-05-07T19:44:42.6576207Z 2025-05-07T19:44:42.6576229Z 2025-05-07T19:44:42.6576246Z 2025-05-07T19:44:42.6672140Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:42.6672427Z 2025-05-07T19:44:42.6672432Z 2025-05-07T19:44:42.6672436Z 2025-05-07T19:44:42.6672440Z 2025-05-07T19:44:42.6672443Z 2025-05-07T19:44:42.6674691Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:42.6674978Z 2025-05-07T19:44:42.6674983Z 2025-05-07T19:44:42.6674986Z 2025-05-07T19:44:42.6674991Z 2025-05-07T19:44:42.6674998Z 2025-05-07T19:44:42.6919047Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:42.6919680Z 2025-05-07T19:44:42.6919725Z 2025-05-07T19:44:42.7044445Z libllvm16-16.0.6 | 33.7 MB | ########2 | 83%  2025-05-07T19:44:42.7044757Z 2025-05-07T19:44:42.7044761Z 2025-05-07T19:44:42.7044765Z 2025-05-07T19:44:42.7044768Z 2025-05-07T19:44:42.7044772Z 2025-05-07T19:44:42.7044775Z 2025-05-07T19:44:42.7045044Z 2025-05-07T19:44:42.7045072Z 2025-05-07T19:44:42.7045076Z 2025-05-07T19:44:42.7045104Z 2025-05-07T19:44:42.7080557Z libcxxabi-19.1.7 | 158 KB | # | 10%  2025-05-07T19:44:42.7117285Z llvm-openmp-16.0.6 | 39.9 MB | #######7 | 77% 2025-05-07T19:44:42.7117717Z 2025-05-07T19:44:42.7117829Z 2025-05-07T19:44:42.7117834Z 2025-05-07T19:44:42.7117862Z 2025-05-07T19:44:42.7117867Z 2025-05-07T19:44:42.7117905Z 2025-05-07T19:44:42.7117916Z 2025-05-07T19:44:42.7117922Z 2025-05-07T19:44:42.7117966Z 2025-05-07T19:44:42.7117972Z 2025-05-07T19:44:42.7127781Z 2025-05-07T19:44:42.7128715Z clang-16.0.6 | 110 KB | #4 | 15%  2025-05-07T19:44:42.7129587Z 2025-05-07T19:44:42.7129601Z 2025-05-07T19:44:42.7129614Z 2025-05-07T19:44:42.7129625Z 2025-05-07T19:44:42.7129636Z 2025-05-07T19:44:42.7129647Z 2025-05-07T19:44:42.7129658Z 2025-05-07T19:44:42.7129670Z 2025-05-07T19:44:42.7129722Z 2025-05-07T19:44:42.7130145Z 2025-05-07T19:44:42.7185468Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:42.7186421Z 2025-05-07T19:44:42.7186435Z 2025-05-07T19:44:42.7186446Z 2025-05-07T19:44:42.7186458Z 2025-05-07T19:44:42.7186468Z 2025-05-07T19:44:42.7186479Z 2025-05-07T19:44:42.7186489Z 2025-05-07T19:44:42.7186500Z 2025-05-07T19:44:42.7186510Z 2025-05-07T19:44:42.7186520Z 2025-05-07T19:44:42.7186550Z 2025-05-07T19:44:42.7407481Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:42.7408378Z 2025-05-07T19:44:42.7606784Z compiler-rt_linux-64 | 36.0 MB | #######1 | 71%  2025-05-07T19:44:42.7607087Z 2025-05-07T19:44:42.7607091Z 2025-05-07T19:44:42.7607095Z 2025-05-07T19:44:42.7607099Z 2025-05-07T19:44:42.7607103Z 2025-05-07T19:44:42.7607107Z 2025-05-07T19:44:42.7607111Z 2025-05-07T19:44:42.7607115Z 2025-05-07T19:44:42.7607118Z 2025-05-07T19:44:42.7607122Z 2025-05-07T19:44:42.7607145Z 2025-05-07T19:44:42.7607377Z 2025-05-07T19:44:42.7663993Z clangxx-16.0.6 | 110 KB | #4 | 15%  2025-05-07T19:44:42.7664663Z 2025-05-07T19:44:42.7664710Z 2025-05-07T19:44:42.7664717Z 2025-05-07T19:44:42.7664756Z 2025-05-07T19:44:42.7664759Z 2025-05-07T19:44:42.7664779Z 2025-05-07T19:44:42.7664812Z 2025-05-07T19:44:42.7664819Z 2025-05-07T19:44:42.7664889Z 2025-05-07T19:44:42.7664894Z 2025-05-07T19:44:42.7664922Z 2025-05-07T19:44:42.7664929Z 2025-05-07T19:44:42.7665080Z 2025-05-07T19:44:42.7669442Z compiler-rt-16.0.6 | 107 KB | #4 | 15%  2025-05-07T19:44:42.7669808Z 2025-05-07T19:44:42.7669814Z 2025-05-07T19:44:42.7669820Z 2025-05-07T19:44:42.7669825Z 2025-05-07T19:44:42.7669831Z 2025-05-07T19:44:42.7669836Z 2025-05-07T19:44:42.7669839Z 2025-05-07T19:44:42.7669843Z 2025-05-07T19:44:42.7669847Z 2025-05-07T19:44:42.7669852Z 2025-05-07T19:44:42.7669856Z 2025-05-07T19:44:42.7669913Z 2025-05-07T19:44:42.7730501Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:42.7730829Z 2025-05-07T19:44:42.7730835Z 2025-05-07T19:44:42.7730852Z 2025-05-07T19:44:42.7730857Z 2025-05-07T19:44:42.7730862Z 2025-05-07T19:44:42.7730865Z 2025-05-07T19:44:42.7730896Z 2025-05-07T19:44:42.7730899Z 2025-05-07T19:44:42.7730903Z 2025-05-07T19:44:42.7730907Z 2025-05-07T19:44:42.7730911Z 2025-05-07T19:44:42.7730914Z 2025-05-07T19:44:42.7730917Z 2025-05-07T19:44:42.8001954Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:42.8002314Z 2025-05-07T19:44:42.8003787Z 2025-05-07T19:44:42.8154356Z libllvm16-16.0.6 | 33.7 MB | #########5 | 96%  2025-05-07T19:44:42.8154926Z 2025-05-07T19:44:42.8154937Z 2025-05-07T19:44:42.8154943Z 2025-05-07T19:44:42.8154948Z 2025-05-07T19:44:42.8154954Z 2025-05-07T19:44:42.8154958Z 2025-05-07T19:44:42.8154963Z 2025-05-07T19:44:42.8155229Z 2025-05-07T19:44:42.8155258Z 2025-05-07T19:44:42.8155264Z 2025-05-07T19:44:42.8155268Z 2025-05-07T19:44:42.8155272Z 2025-05-07T19:44:42.8155284Z 2025-05-07T19:44:42.8155288Z 2025-05-07T19:44:42.8155293Z 2025-05-07T19:44:42.8171965Z libzlib-1.2.13 | 60 KB | ##6 | 27%  2025-05-07T19:44:42.8172300Z 2025-05-07T19:44:42.8172304Z 2025-05-07T19:44:42.8172307Z 2025-05-07T19:44:42.8172311Z 2025-05-07T19:44:42.8172314Z 2025-05-07T19:44:42.8172317Z 2025-05-07T19:44:42.8172321Z 2025-05-07T19:44:42.8172324Z 2025-05-07T19:44:42.8172335Z 2025-05-07T19:44:42.8172339Z 2025-05-07T19:44:42.8172342Z 2025-05-07T19:44:42.8172345Z 2025-05-07T19:44:42.8172349Z 2025-05-07T19:44:42.8172928Z 2025-05-07T19:44:42.8192953Z zlib-1.2.13 | 91 KB | #7 | 18%  2025-05-07T19:44:42.8193291Z 2025-05-07T19:44:42.8193299Z 2025-05-07T19:44:42.8193304Z 2025-05-07T19:44:42.8193311Z 2025-05-07T19:44:42.8193346Z 2025-05-07T19:44:42.8193577Z 2025-05-07T19:44:42.8193581Z 2025-05-07T19:44:42.8193585Z 2025-05-07T19:44:42.8193604Z 2025-05-07T19:44:42.8193608Z 2025-05-07T19:44:42.8193611Z 2025-05-07T19:44:42.8193615Z 2025-05-07T19:44:42.8193618Z 2025-05-07T19:44:42.8193622Z 2025-05-07T19:44:42.8193625Z 2025-05-07T19:44:42.8220977Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:42.8221981Z 2025-05-07T19:44:42.8221996Z 2025-05-07T19:44:42.8222008Z 2025-05-07T19:44:42.8222019Z 2025-05-07T19:44:42.8222030Z 2025-05-07T19:44:42.8222040Z 2025-05-07T19:44:42.8222051Z 2025-05-07T19:44:42.8222061Z 2025-05-07T19:44:42.8222071Z 2025-05-07T19:44:42.8222082Z 2025-05-07T19:44:42.8222092Z 2025-05-07T19:44:42.8222103Z 2025-05-07T19:44:42.8222113Z 2025-05-07T19:44:42.8222123Z 2025-05-07T19:44:42.8341437Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:42.8406721Z llvm-openmp-16.0.6 | 39.9 MB | #########1 | 92% 2025-05-07T19:44:42.8407011Z 2025-05-07T19:44:43.0610827Z compiler-rt_linux-64 | 36.0 MB | ########5 | 86%  2025-05-07T19:44:43.0611126Z 2025-05-07T19:44:43.0611131Z 2025-05-07T19:44:43.0611136Z 2025-05-07T19:44:43.0611140Z 2025-05-07T19:44:43.0611161Z 2025-05-07T19:44:43.0611165Z 2025-05-07T19:44:43.0612644Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:43.0612925Z 2025-05-07T19:44:43.0612930Z 2025-05-07T19:44:43.0612933Z 2025-05-07T19:44:43.0612946Z 2025-05-07T19:44:43.0612949Z 2025-05-07T19:44:43.0612953Z 2025-05-07T19:44:43.1264120Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:43.1264444Z 2025-05-07T19:44:43.1264579Z 2025-05-07T19:44:43.1264583Z 2025-05-07T19:44:43.1264600Z 2025-05-07T19:44:43.1264604Z 2025-05-07T19:44:43.1264608Z 2025-05-07T19:44:43.1264628Z 2025-05-07T19:44:43.1265606Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:43.1266042Z 2025-05-07T19:44:43.1266046Z 2025-05-07T19:44:43.1266050Z 2025-05-07T19:44:43.1266053Z 2025-05-07T19:44:43.1266057Z 2025-05-07T19:44:43.1266060Z 2025-05-07T19:44:43.1266064Z 2025-05-07T19:44:43.1618297Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:43.1618624Z 2025-05-07T19:44:43.1618629Z 2025-05-07T19:44:43.1618633Z 2025-05-07T19:44:43.1618636Z 2025-05-07T19:44:43.1792912Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:43.1793199Z 2025-05-07T19:44:43.1793321Z 2025-05-07T19:44:43.1992219Z libllvm16-16.0.6 | 33.7 MB | ########## | 100%  2025-05-07T19:44:43.1992531Z 2025-05-07T19:44:43.1992538Z 2025-05-07T19:44:43.1992544Z 2025-05-07T19:44:43.1992548Z 2025-05-07T19:44:43.1992551Z 2025-05-07T19:44:43.1992556Z 2025-05-07T19:44:43.1992560Z 2025-05-07T19:44:43.1992563Z 2025-05-07T19:44:43.1993158Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:43.1993712Z 2025-05-07T19:44:43.1993726Z 2025-05-07T19:44:43.1993729Z 2025-05-07T19:44:43.1993733Z 2025-05-07T19:44:43.1993736Z 2025-05-07T19:44:43.1993740Z 2025-05-07T19:44:43.1993743Z 2025-05-07T19:44:43.1993746Z 2025-05-07T19:44:43.2081697Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:43.2082012Z 2025-05-07T19:44:43.2082127Z 2025-05-07T19:44:43.2082135Z 2025-05-07T19:44:43.2082153Z 2025-05-07T19:44:43.2082157Z 2025-05-07T19:44:43.2082162Z 2025-05-07T19:44:43.2082200Z 2025-05-07T19:44:43.2082206Z 2025-05-07T19:44:43.2082209Z 2025-05-07T19:44:43.2084013Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:43.2084326Z 2025-05-07T19:44:43.2084330Z 2025-05-07T19:44:43.2084334Z 2025-05-07T19:44:43.2084337Z 2025-05-07T19:44:43.2084341Z 2025-05-07T19:44:43.2084344Z 2025-05-07T19:44:43.2084348Z 2025-05-07T19:44:43.2084351Z 2025-05-07T19:44:43.2084355Z 2025-05-07T19:44:43.2325010Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:43.2325337Z 2025-05-07T19:44:43.2325342Z 2025-05-07T19:44:43.2325346Z 2025-05-07T19:44:43.2325350Z 2025-05-07T19:44:43.2325353Z 2025-05-07T19:44:43.2325357Z 2025-05-07T19:44:43.2325360Z 2025-05-07T19:44:43.2325364Z 2025-05-07T19:44:43.2325381Z 2025-05-07T19:44:43.2325385Z 2025-05-07T19:44:43.2327293Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:43.2327594Z 2025-05-07T19:44:43.2327598Z 2025-05-07T19:44:43.2327613Z 2025-05-07T19:44:43.2327617Z 2025-05-07T19:44:43.2327620Z 2025-05-07T19:44:43.2327624Z 2025-05-07T19:44:43.2327640Z 2025-05-07T19:44:43.2327644Z 2025-05-07T19:44:43.2327647Z 2025-05-07T19:44:43.2327651Z 2025-05-07T19:44:43.2577351Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:43.2577681Z 2025-05-07T19:44:43.2577686Z 2025-05-07T19:44:43.2577690Z 2025-05-07T19:44:43.2923389Z libclang-cpp16-16.0. | 17.3 MB | ########## | 100%  2025-05-07T19:44:43.2923782Z 2025-05-07T19:44:43.2924023Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:43.2924292Z 2025-05-07T19:44:43.3069417Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:43.3069720Z 2025-05-07T19:44:43.3069724Z 2025-05-07T19:44:43.3069742Z 2025-05-07T19:44:43.3069745Z 2025-05-07T19:44:43.3069749Z 2025-05-07T19:44:43.3069752Z 2025-05-07T19:44:43.3069756Z 2025-05-07T19:44:43.3069759Z 2025-05-07T19:44:43.3069763Z 2025-05-07T19:44:43.3069766Z 2025-05-07T19:44:43.3069770Z 2025-05-07T19:44:43.3069773Z 2025-05-07T19:44:43.3069777Z 2025-05-07T19:44:43.3074160Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:43.3074494Z 2025-05-07T19:44:43.3074497Z 2025-05-07T19:44:43.3074501Z 2025-05-07T19:44:43.3074514Z 2025-05-07T19:44:43.3074518Z 2025-05-07T19:44:43.3074522Z 2025-05-07T19:44:43.3074538Z 2025-05-07T19:44:43.3074547Z 2025-05-07T19:44:43.3074551Z 2025-05-07T19:44:43.3074554Z 2025-05-07T19:44:43.3074558Z 2025-05-07T19:44:43.3074561Z 2025-05-07T19:44:43.3074564Z 2025-05-07T19:44:43.3238913Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:43.3239295Z 2025-05-07T19:44:43.3239300Z 2025-05-07T19:44:43.3239304Z 2025-05-07T19:44:43.3239307Z 2025-05-07T19:44:43.3239311Z 2025-05-07T19:44:43.3239314Z 2025-05-07T19:44:43.3239333Z 2025-05-07T19:44:43.3239336Z 2025-05-07T19:44:43.3239340Z 2025-05-07T19:44:43.3239343Z 2025-05-07T19:44:43.3239347Z 2025-05-07T19:44:43.3239750Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:43.3240030Z 2025-05-07T19:44:43.3240034Z 2025-05-07T19:44:43.3240037Z 2025-05-07T19:44:43.3240054Z 2025-05-07T19:44:43.3240057Z 2025-05-07T19:44:43.3240061Z 2025-05-07T19:44:43.3240064Z 2025-05-07T19:44:43.3240067Z 2025-05-07T19:44:43.3240249Z 2025-05-07T19:44:43.3240259Z 2025-05-07T19:44:43.3240263Z 2025-05-07T19:44:43.3249211Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:43.3291096Z llvm-openmp-16.0.6 | 39.9 MB | ########## | 100% 2025-05-07T19:44:43.3291412Z 2025-05-07T19:44:43.3291416Z 2025-05-07T19:44:43.3291420Z 2025-05-07T19:44:43.3291424Z 2025-05-07T19:44:43.3291427Z 2025-05-07T19:44:43.3291431Z 2025-05-07T19:44:43.3291434Z 2025-05-07T19:44:43.3291438Z 2025-05-07T19:44:43.3291442Z 2025-05-07T19:44:43.3291445Z 2025-05-07T19:44:43.3291448Z 2025-05-07T19:44:43.3291453Z 2025-05-07T19:44:43.3291457Z 2025-05-07T19:44:43.3291460Z 2025-05-07T19:44:43.3291474Z 2025-05-07T19:44:43.3294946Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:43.3295263Z 2025-05-07T19:44:43.3295267Z 2025-05-07T19:44:43.3295270Z 2025-05-07T19:44:43.3295274Z 2025-05-07T19:44:43.3295277Z 2025-05-07T19:44:43.3295292Z 2025-05-07T19:44:43.3295449Z 2025-05-07T19:44:43.3295453Z 2025-05-07T19:44:43.3295458Z 2025-05-07T19:44:43.3295476Z 2025-05-07T19:44:43.3295480Z 2025-05-07T19:44:43.3295484Z 2025-05-07T19:44:43.3295487Z 2025-05-07T19:44:43.3295491Z 2025-05-07T19:44:43.3295505Z 2025-05-07T19:44:43.3466826Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:43.3467321Z 2025-05-07T19:44:43.3467341Z 2025-05-07T19:44:43.3467345Z 2025-05-07T19:44:43.3467348Z 2025-05-07T19:44:43.3467352Z 2025-05-07T19:44:43.3467355Z 2025-05-07T19:44:43.3467359Z 2025-05-07T19:44:43.3467362Z 2025-05-07T19:44:43.3467366Z 2025-05-07T19:44:43.3467369Z 2025-05-07T19:44:43.3467373Z 2025-05-07T19:44:43.3467376Z 2025-05-07T19:44:43.3468708Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:43.3469022Z 2025-05-07T19:44:43.3469026Z 2025-05-07T19:44:43.3469040Z 2025-05-07T19:44:43.3469043Z 2025-05-07T19:44:43.3469059Z 2025-05-07T19:44:43.3469069Z 2025-05-07T19:44:43.3469072Z 2025-05-07T19:44:43.3469076Z 2025-05-07T19:44:43.3469079Z 2025-05-07T19:44:43.3469082Z 2025-05-07T19:44:43.3469086Z 2025-05-07T19:44:43.3469089Z 2025-05-07T19:44:43.3507654Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:43.3508011Z 2025-05-07T19:44:43.3508015Z 2025-05-07T19:44:43.3508020Z 2025-05-07T19:44:43.3508023Z 2025-05-07T19:44:43.3508027Z 2025-05-07T19:44:43.3508031Z 2025-05-07T19:44:43.3508034Z 2025-05-07T19:44:43.3508038Z 2025-05-07T19:44:43.3508041Z 2025-05-07T19:44:43.3508045Z 2025-05-07T19:44:43.3508048Z 2025-05-07T19:44:43.3508052Z 2025-05-07T19:44:43.3508055Z 2025-05-07T19:44:43.3508058Z 2025-05-07T19:44:43.3509207Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:43.3509495Z 2025-05-07T19:44:43.3509499Z 2025-05-07T19:44:43.3509502Z 2025-05-07T19:44:43.3509522Z 2025-05-07T19:44:43.3509538Z 2025-05-07T19:44:43.3509547Z 2025-05-07T19:44:43.3509550Z 2025-05-07T19:44:43.3509554Z 2025-05-07T19:44:43.3509557Z 2025-05-07T19:44:43.3509573Z 2025-05-07T19:44:43.3509576Z 2025-05-07T19:44:43.3509580Z 2025-05-07T19:44:43.3509583Z 2025-05-07T19:44:43.3509586Z 2025-05-07T19:44:43.8791192Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:43.8791541Z 2025-05-07T19:44:43.8919782Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:43.8920345Z 2025-05-07T19:44:43.8920351Z 2025-05-07T19:44:43.9430990Z libllvm16-16.0.6 | 33.7 MB | ########## | 100%  2025-05-07T19:44:43.9433218Z llvm-openmp-16.0.6 | 39.9 MB | ########## | 100% 2025-05-07T19:44:43.9433960Z 2025-05-07T19:44:43.9434170Z 2025-05-07T19:44:43.9434363Z  2025-05-07T19:44:43.9434588Z 2025-05-07T19:44:43.9434593Z 2025-05-07T19:44:43.9435030Z  2025-05-07T19:44:43.9435264Z 2025-05-07T19:44:43.9435268Z 2025-05-07T19:44:43.9435272Z 2025-05-07T19:44:43.9435451Z  2025-05-07T19:44:43.9435675Z 2025-05-07T19:44:43.9435678Z 2025-05-07T19:44:43.9435682Z 2025-05-07T19:44:43.9435686Z 2025-05-07T19:44:43.9435878Z  2025-05-07T19:44:43.9436101Z 2025-05-07T19:44:43.9436104Z 2025-05-07T19:44:43.9436159Z 2025-05-07T19:44:43.9436176Z 2025-05-07T19:44:43.9436180Z 2025-05-07T19:44:43.9436358Z  2025-05-07T19:44:43.9436579Z 2025-05-07T19:44:43.9436582Z 2025-05-07T19:44:43.9436586Z 2025-05-07T19:44:43.9436589Z 2025-05-07T19:44:43.9436593Z 2025-05-07T19:44:43.9436597Z 2025-05-07T19:44:43.9436800Z  2025-05-07T19:44:43.9437035Z 2025-05-07T19:44:43.9437205Z 2025-05-07T19:44:43.9437209Z 2025-05-07T19:44:43.9437213Z 2025-05-07T19:44:43.9437216Z 2025-05-07T19:44:43.9437219Z 2025-05-07T19:44:43.9437223Z 2025-05-07T19:44:43.9437424Z  2025-05-07T19:44:43.9437650Z 2025-05-07T19:44:43.9437654Z 2025-05-07T19:44:43.9437657Z 2025-05-07T19:44:43.9437661Z 2025-05-07T19:44:43.9437664Z 2025-05-07T19:44:43.9437668Z 2025-05-07T19:44:43.9437671Z 2025-05-07T19:44:43.9437674Z 2025-05-07T19:44:43.9437866Z  2025-05-07T19:44:43.9438109Z 2025-05-07T19:44:43.9438112Z 2025-05-07T19:44:43.9438116Z 2025-05-07T19:44:43.9438119Z 2025-05-07T19:44:43.9438123Z 2025-05-07T19:44:43.9438126Z 2025-05-07T19:44:43.9438130Z 2025-05-07T19:44:43.9438133Z 2025-05-07T19:44:43.9438136Z 2025-05-07T19:44:43.9438328Z  2025-05-07T19:44:43.9438580Z 2025-05-07T19:44:43.9438588Z 2025-05-07T19:44:43.9438591Z 2025-05-07T19:44:43.9438595Z 2025-05-07T19:44:43.9438598Z 2025-05-07T19:44:43.9438602Z 2025-05-07T19:44:43.9438605Z 2025-05-07T19:44:43.9438608Z 2025-05-07T19:44:43.9438612Z 2025-05-07T19:44:43.9438615Z 2025-05-07T19:44:43.9438811Z  2025-05-07T19:44:43.9439063Z 2025-05-07T19:44:43.9439067Z 2025-05-07T19:44:43.9439071Z 2025-05-07T19:44:43.9439074Z 2025-05-07T19:44:43.9439078Z 2025-05-07T19:44:43.9439081Z 2025-05-07T19:44:43.9439084Z 2025-05-07T19:44:43.9439088Z 2025-05-07T19:44:43.9439091Z 2025-05-07T19:44:43.9439095Z 2025-05-07T19:44:43.9439099Z 2025-05-07T19:44:43.9439298Z  2025-05-07T19:44:43.9439548Z 2025-05-07T19:44:43.9439552Z 2025-05-07T19:44:43.9439555Z 2025-05-07T19:44:43.9439559Z 2025-05-07T19:44:43.9439562Z 2025-05-07T19:44:43.9439570Z 2025-05-07T19:44:43.9439577Z 2025-05-07T19:44:43.9439580Z 2025-05-07T19:44:43.9439584Z 2025-05-07T19:44:43.9439587Z 2025-05-07T19:44:43.9439590Z 2025-05-07T19:44:43.9439594Z 2025-05-07T19:44:43.9439808Z  2025-05-07T19:44:43.9440045Z 2025-05-07T19:44:43.9440049Z 2025-05-07T19:44:43.9440052Z 2025-05-07T19:44:43.9440055Z 2025-05-07T19:44:43.9440059Z 2025-05-07T19:44:43.9440062Z 2025-05-07T19:44:43.9440066Z 2025-05-07T19:44:43.9440069Z 2025-05-07T19:44:43.9440072Z 2025-05-07T19:44:43.9440076Z 2025-05-07T19:44:43.9440079Z 2025-05-07T19:44:43.9440083Z 2025-05-07T19:44:43.9440086Z 2025-05-07T19:44:43.9440303Z  2025-05-07T19:44:43.9440542Z 2025-05-07T19:44:43.9440545Z 2025-05-07T19:44:43.9440549Z 2025-05-07T19:44:43.9440552Z 2025-05-07T19:44:43.9440555Z 2025-05-07T19:44:43.9440559Z 2025-05-07T19:44:43.9440623Z 2025-05-07T19:44:43.9440630Z 2025-05-07T19:44:43.9440633Z 2025-05-07T19:44:43.9440637Z 2025-05-07T19:44:43.9440641Z 2025-05-07T19:44:43.9440644Z 2025-05-07T19:44:43.9440661Z 2025-05-07T19:44:43.9440665Z 2025-05-07T19:44:43.9440878Z  2025-05-07T19:44:43.9441120Z 2025-05-07T19:44:43.9441123Z 2025-05-07T19:44:43.9441127Z 2025-05-07T19:44:43.9441130Z 2025-05-07T19:44:43.9441133Z 2025-05-07T19:44:43.9441137Z 2025-05-07T19:44:43.9441140Z 2025-05-07T19:44:43.9441144Z 2025-05-07T19:44:43.9441161Z 2025-05-07T19:44:43.9441164Z 2025-05-07T19:44:43.9441167Z 2025-05-07T19:44:43.9441171Z 2025-05-07T19:44:43.9441174Z 2025-05-07T19:44:43.9441177Z 2025-05-07T19:44:43.9441181Z 2025-05-07T19:44:43.9441432Z  done 2025-05-07T19:44:44.0446121Z Preparing transaction: \ done 2025-05-07T19:44:44.1459104Z Verifying transaction: / done 2025-05-07T19:44:44.2475169Z Executing transaction: \ done 2025-05-07T19:44:44.3361213Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:48.0596618Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:48.0597167Z 2025-05-07T19:44:48.0612809Z 2025-05-07T19:44:48.0635132Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:48.0635650Z 2025-05-07T19:44:48.0646375Z 2025-05-07T19:44:48.0665378Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:48.0665892Z 2025-05-07T19:44:48.0684542Z 2025-05-07T19:44:48.0728647Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:48.0729232Z 2025-05-07T19:44:48.0729237Z 2025-05-07T19:44:48.0729382Z + conda env config vars set -n build_binary CC= 2025-05-07T19:44:48.0729673Z 2025-05-07T19:44:48.4934995Z 2025-05-07T19:44:48.4935360Z + conda env config vars set -n build_binary CXX= 2025-05-07T19:44:48.4935919Z 2025-05-07T19:44:48.9004251Z 2025-05-07T19:44:48.9005149Z + conda run -n build_binary printenv CC 2025-05-07T19:44:48.9005406Z 2025-05-07T19:44:50.6869915Z /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc 2025-05-07T19:44:50.6870313Z 2025-05-07T19:44:50.7646127Z 2025-05-07T19:44:50.7646730Z + conda run -n build_binary printenv CXX 2025-05-07T19:44:50.7647008Z 2025-05-07T19:44:52.5524766Z /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ 2025-05-07T19:44:52.5525174Z 2025-05-07T19:44:52.6305186Z 2025-05-07T19:44:54.4853979Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/lib ... 2025-05-07T19:44:56.2464055Z ERROR conda.cli.main_run:execute(125): `conda run printenv LD_LIBRARY_PATH` failed. (See above for error) 2025-05-07T19:44:56.3047382Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/lib 2025-05-07T19:44:56.3048745Z 2025-05-07T19:44:56.7075657Z 2025-05-07T19:44:58.4961611Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:58.4962393Z 2025-05-07T19:44:58.5557702Z [CHECK] Binary cc found in PATH 2025-05-07T19:45:00.3307716Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:45:00.3308510Z 2025-05-07T19:45:00.3893217Z [CHECK] Binary gcc found in PATH 2025-05-07T19:45:02.1625050Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:45:02.1625878Z 2025-05-07T19:45:02.2190814Z [CHECK] Binary c++ found in PATH 2025-05-07T19:45:04.0247007Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:45:04.0247850Z 2025-05-07T19:45:04.0836636Z [CHECK] Binary g++ found in PATH 2025-05-07T19:45:04.0837873Z [INFO] Printing out all preprocessor defines in the C compiler ... 2025-05-07T19:45:04.0839122Z + conda run -n build_binary cc -dM -E - 2025-05-07T19:45:04.0839761Z 2025-05-07T19:45:05.9401546Z #define _LP64 1 2025-05-07T19:45:05.9401907Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:45:05.9402190Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:45:05.9402511Z #define __ATOMIC_CONSUME 1 2025-05-07T19:45:05.9402884Z #define __ATOMIC_RELAXED 0 2025-05-07T19:45:05.9403260Z #define __ATOMIC_RELEASE 3 2025-05-07T19:45:05.9403511Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:45:05.9403795Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:45:05.9404085Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:45:05.9404380Z #define __BOOL_WIDTH__ 8 2025-05-07T19:45:05.9404657Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:45:05.9405011Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:45:05.9405327Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:45:05.9405607Z #define __CHAR_BIT__ 8 2025-05-07T19:45:05.9405873Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:05.9406192Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:05.9406534Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:05.9407195Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:05.9407530Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:05.9407837Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:05.9408167Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:05.9408506Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:05.9408827Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:05.9409157Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:05.9409468Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:45:05.9409769Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:45:05.9410069Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:45:05.9410413Z #define __DBL_DIG__ 15 2025-05-07T19:45:05.9410676Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:45:05.9411006Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:45:05.9411287Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:45:05.9411557Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:05.9411857Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:45:05.9412116Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:45:05.9412394Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:45:05.9412663Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:45:05.9413009Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:45:05.9413285Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:45:05.9413583Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:45:05.9413901Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:45:05.9414217Z #define __ELF__ 1 2025-05-07T19:45:05.9414444Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:45:05.9414714Z #define __FLOAT128__ 1 2025-05-07T19:45:05.9414953Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:45:05.9415272Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:45:05.9415611Z #define __FLT16_DIG__ 3 2025-05-07T19:45:05.9415861Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:45:05.9416173Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:45:05.9416441Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:45:05.9416739Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:45:05.9417014Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:45:05.9417397Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:45:05.9417653Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:45:05.9417921Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:45:05.9418191Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:45:05.9418578Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:45:05.9418839Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:45:05.9419108Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:45:05.9419380Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:45:05.9419649Z #define __FLT_DIG__ 6 2025-05-07T19:45:05.9419887Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:45:05.9420153Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:45:05.9420413Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:45:05.9420658Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:45:05.9420914Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:45:05.9421274Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:45:05.9421535Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:45:05.9421789Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:45:05.9422051Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:45:05.9422319Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:45:05.9422567Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:45:05.9422831Z #define __FLT_RADIX__ 2 2025-05-07T19:45:05.9423044Z #define __FXSR__ 1 2025-05-07T19:45:05.9423272Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:45:05.9423539Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:05.9423832Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:05.9424122Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:05.9424422Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:05.9424705Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:05.9424978Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:05.9425269Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:05.9425623Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:05.9425927Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:05.9426213Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:45:05.9426523Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:05.9426803Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:45:05.9427101Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:45:05.9427411Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:45:05.9427741Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:45:05.9428062Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:45:05.9428348Z #define __GNUC_MINOR__ 2 2025-05-07T19:45:05.9428595Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:45:05.9428844Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:45:05.9429094Z #define __GNUC__ 4 2025-05-07T19:45:05.9429305Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:45:05.9429561Z #define __INT16_C_SUFFIX__ 2025-05-07T19:45:05.9429796Z #define __INT16_FMTd__ "hd" 2025-05-07T19:45:05.9430052Z #define __INT16_FMTi__ "hi" 2025-05-07T19:45:05.9430281Z #define __INT16_MAX__ 32767 2025-05-07T19:45:05.9430531Z #define __INT16_TYPE__ short 2025-05-07T19:45:05.9430783Z #define __INT32_C_SUFFIX__ 2025-05-07T19:45:05.9431016Z #define __INT32_FMTd__ "d" 2025-05-07T19:45:05.9431264Z #define __INT32_FMTi__ "i" 2025-05-07T19:45:05.9431498Z #define __INT32_MAX__ 2147483647 2025-05-07T19:45:05.9431757Z #define __INT32_TYPE__ int 2025-05-07T19:45:05.9431989Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:45:05.9432244Z #define __INT64_FMTd__ "ld" 2025-05-07T19:45:05.9432479Z #define __INT64_FMTi__ "li" 2025-05-07T19:45:05.9432740Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9433018Z #define __INT64_TYPE__ long int 2025-05-07T19:45:05.9433280Z #define __INT8_C_SUFFIX__ 2025-05-07T19:45:05.9433530Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:45:05.9433764Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:45:05.9434009Z #define __INT8_MAX__ 127 2025-05-07T19:45:05.9434243Z #define __INT8_TYPE__ signed char 2025-05-07T19:45:05.9434526Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:45:05.9434774Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:45:05.9435034Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:45:05.9435289Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9435587Z #define __INTMAX_TYPE__ long int 2025-05-07T19:45:05.9435840Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:45:05.9436097Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:45:05.9436355Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:45:05.9436608Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9436903Z #define __INTPTR_TYPE__ long int 2025-05-07T19:45:05.9437153Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:45:05.9437406Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:45:05.9437657Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:45:05.9437921Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:45:05.9438172Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:45:05.9438441Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:45:05.9438778Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:45:05.9439036Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:45:05.9439302Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:45:05.9439567Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:45:05.9439828Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:45:05.9440077Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:45:05.9440342Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:45:05.9440616Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9440933Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:45:05.9441200Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:45:05.9441463Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:45:05.9441714Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:45:05.9441979Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:45:05.9442248Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:45:05.9442522Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:45:05.9442862Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:45:05.9443301Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:45:05.9443726Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:45:05.9444007Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:45:05.9444302Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:45:05.9444571Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:45:05.9444861Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:45:05.9445139Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:45:05.9445445Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:45:05.9445726Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:45:05.9445995Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:45:05.9446285Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:45:05.9446582Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9446920Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:45:05.9447207Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:45:05.9447492Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:45:05.9447768Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:45:05.9448057Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:45:05.9448353Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:45:05.9448652Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:45:05.9448927Z #define __INT_MAX__ 2147483647 2025-05-07T19:45:05.9449181Z #define __INT_WIDTH__ 32 2025-05-07T19:45:05.9449445Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:45:05.9449761Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:45:05.9450116Z #define __LDBL_DIG__ 18 2025-05-07T19:45:05.9450391Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:45:05.9450731Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:45:05.9450999Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:45:05.9451282Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:05.9451565Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:45:05.9451826Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:45:05.9452111Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:45:05.9452398Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:45:05.9452739Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:45:05.9453028Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:45:05.9453337Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:45:05.9453653Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:45:05.9453921Z #define __LLONG_WIDTH__ 64 2025-05-07T19:45:05.9454194Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:45:05.9454530Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9454838Z #define __LONG_WIDTH__ 64 2025-05-07T19:45:05.9455081Z #define __LP64__ 1 2025-05-07T19:45:05.9455417Z #define __MMX__ 1 2025-05-07T19:45:05.9455622Z #define __NO_INLINE__ 1 2025-05-07T19:45:05.9455860Z #define __NO_MATH_INLINES 1 2025-05-07T19:45:05.9456099Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:45:05.9456391Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:45:05.9456704Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:45:05.9457005Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:45:05.9457307Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:45:05.9457699Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:45:05.9458007Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:45:05.9458277Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:45:05.9458567Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:45:05.9458818Z #define __PIC__ 2 2025-05-07T19:45:05.9459039Z #define __PIE__ 2 2025-05-07T19:45:05.9459247Z #define __POINTER_WIDTH__ 64 2025-05-07T19:45:05.9459516Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:45:05.9459787Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:45:05.9460052Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:45:05.9460327Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:45:05.9460611Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:45:05.9460883Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:45:05.9461126Z #define __REGISTER_PREFIX__ 2025-05-07T19:45:05.9461377Z #define __SCHAR_MAX__ 127 2025-05-07T19:45:05.9461598Z #define __SEG_FS 1 2025-05-07T19:45:05.9461809Z #define __SEG_GS 1 2025-05-07T19:45:05.9462083Z #define __SHRT_MAX__ 32767 2025-05-07T19:45:05.9462330Z #define __SHRT_WIDTH__ 16 2025-05-07T19:45:05.9462566Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:45:05.9462845Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:45:05.9463103Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:45:05.9463342Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:45:05.9463602Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:45:05.9463836Z #define __SIZEOF_INT128__ 16 2025-05-07T19:45:05.9464090Z #define __SIZEOF_INT__ 4 2025-05-07T19:45:05.9464328Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:45:05.9464606Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:45:05.9464852Z #define __SIZEOF_LONG__ 8 2025-05-07T19:45:05.9465105Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:45:05.9465351Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:45:05.9465608Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:45:05.9465840Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:45:05.9466115Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:45:05.9466414Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:45:05.9466681Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:45:05.9466971Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:45:05.9467953Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:45:05.9468321Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:45:05.9468606Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:45:05.9468977Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:45:05.9469300Z #define __SIZE_WIDTH__ 64 2025-05-07T19:45:05.9469602Z #define __SSE2_MATH__ 1 2025-05-07T19:45:05.9469861Z #define __SSE2__ 1 2025-05-07T19:45:05.9470139Z #define __SSE_MATH__ 1 2025-05-07T19:45:05.9470428Z #define __SSE__ 1 2025-05-07T19:45:05.9470675Z #define __STDC_HOSTED__ 1 2025-05-07T19:45:05.9470979Z #define __STDC_UTF_16__ 1 2025-05-07T19:45:05.9471249Z #define __STDC_UTF_32__ 1 2025-05-07T19:45:05.9471553Z #define __STDC_VERSION__ 201710L 2025-05-07T19:45:05.9471845Z #define __STDC__ 1 2025-05-07T19:45:05.9472131Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:45:05.9472421Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:45:05.9472746Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:45:05.9473032Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:45:05.9473343Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:45:05.9473626Z #define __UINT16_MAX__ 65535 2025-05-07T19:45:05.9473951Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:45:05.9474299Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:45:05.9474586Z #define __UINT32_FMTX__ "X" 2025-05-07T19:45:05.9474893Z #define __UINT32_FMTo__ "o" 2025-05-07T19:45:05.9475167Z #define __UINT32_FMTu__ "u" 2025-05-07T19:45:05.9475451Z #define __UINT32_FMTx__ "x" 2025-05-07T19:45:05.9475707Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:45:05.9476004Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:45:05.9476289Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:45:05.9476566Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:45:05.9476821Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:45:05.9477088Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:45:05.9477356Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:45:05.9477781Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:45:05.9478118Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:45:05.9478420Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:45:05.9478697Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:45:05.9478952Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:45:05.9479222Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:45:05.9479475Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:45:05.9479741Z #define __UINT8_MAX__ 255 2025-05-07T19:45:05.9479994Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:45:05.9480411Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:45:05.9480794Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:45:05.9481040Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:45:05.9481299Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:45:05.9481541Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:45:05.9481815Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:45:05.9482120Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:45:05.9482511Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:45:05.9482823Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:45:05.9483258Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:45:05.9483521Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:45:05.9483828Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:45:05.9484127Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:45:05.9484458Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:45:05.9484780Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:45:05.9485049Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:45:05.9485351Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:45:05.9485630Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:45:05.9485922Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:45:05.9486196Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:45:05.9486500Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:45:05.9486827Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:45:05.9487100Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:45:05.9487383Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:45:05.9487658Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:45:05.9487943Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:45:05.9488254Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:45:05.9488578Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:45:05.9488855Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:45:05.9489149Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:45:05.9489424Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:45:05.9489742Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:05.9490106Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:45:05.9490425Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:45:05.9490713Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:45:05.9490985Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:45:05.9491275Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:45:05.9491549Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:45:05.9491843Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:45:05.9492158Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:45:05.9492454Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:45:05.9492749Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:45:05.9493027Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:45:05.9493320Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:45:05.9493614Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:45:05.9493942Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:45:05.9494218Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:45:05.9494529Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:45:05.9494804Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:45:05.9495104Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:45:05.9495523Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:45:05.9495850Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:45:05.9496127Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:45:05.9496417Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:45:05.9496691Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:45:05.9497085Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:05.9497437Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:45:05.9497777Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:45:05.9498071Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:45:05.9498349Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:45:05.9498639Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:45:05.9498914Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:45:05.9499213Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:45:05.9499540Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:45:05.9500174Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:05.9500795Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:45:05.9501077Z #define __WCHAR_TYPE__ int 2025-05-07T19:45:05.9501340Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:45:05.9501585Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:45:05.9501942Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:45:05.9502214Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:45:05.9502650Z #define __WINT_WIDTH__ 32 2025-05-07T19:45:05.9502886Z #define __amd64 1 2025-05-07T19:45:05.9503110Z #define __amd64__ 1 2025-05-07T19:45:05.9503320Z #define __clang__ 1 2025-05-07T19:45:05.9503594Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:45:05.9503895Z #define __clang_major__ 16 2025-05-07T19:45:05.9504154Z #define __clang_minor__ 0 2025-05-07T19:45:05.9504416Z #define __clang_patchlevel__ 6 2025-05-07T19:45:05.9505016Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:05.9505697Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:45:05.9506026Z #define __code_model_small__ 1 2025-05-07T19:45:05.9506301Z #define __gnu_linux__ 1 2025-05-07T19:45:05.9506530Z #define __k8 1 2025-05-07T19:45:05.9506754Z #define __k8__ 1 2025-05-07T19:45:05.9506961Z #define __linux 1 2025-05-07T19:45:05.9507199Z #define __linux__ 1 2025-05-07T19:45:05.9507412Z #define __llvm__ 1 2025-05-07T19:45:05.9507642Z #define __pic__ 2 2025-05-07T19:45:05.9507864Z #define __pie__ 2 2025-05-07T19:45:05.9508127Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:45:05.9508519Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:45:05.9508847Z #define __tune_k8__ 1 2025-05-07T19:45:05.9509086Z #define __unix 1 2025-05-07T19:45:05.9509294Z #define __unix__ 1 2025-05-07T19:45:05.9509519Z #define __x86_64 1 2025-05-07T19:45:05.9509727Z #define __x86_64__ 1 2025-05-07T19:45:05.9509963Z #define linux 1 2025-05-07T19:45:05.9510171Z #define unix 1 2025-05-07T19:45:05.9510312Z 2025-05-07T19:45:06.0173725Z 2025-05-07T19:45:06.0174278Z [INFO] Printing out all preprocessor defines in the C++ compiler ... 2025-05-07T19:45:06.0174785Z + conda run -n build_binary c++ -dM -E -x c++ - 2025-05-07T19:45:06.0175026Z 2025-05-07T19:45:07.8536788Z #define _GNU_SOURCE 1 2025-05-07T19:45:07.8537776Z #define _LP64 1 2025-05-07T19:45:07.8538435Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:45:07.8539205Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:45:07.8539923Z #define __ATOMIC_CONSUME 1 2025-05-07T19:45:07.8540674Z #define __ATOMIC_RELAXED 0 2025-05-07T19:45:07.8541391Z #define __ATOMIC_RELEASE 3 2025-05-07T19:45:07.8542140Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:45:07.8542880Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:45:07.8543826Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:45:07.8544140Z #define __BOOL_WIDTH__ 8 2025-05-07T19:45:07.8544462Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:45:07.8544837Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:45:07.8545152Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:45:07.8545471Z #define __CHAR_BIT__ 8 2025-05-07T19:45:07.8545739Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:07.8546097Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:07.8546437Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:07.8547163Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:07.8547488Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:07.8547835Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:07.8548162Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:07.8548525Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:07.8548889Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:07.8549223Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:07.8549579Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:45:07.8549975Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:45:07.8550299Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:45:07.8550616Z #define __DBL_DIG__ 15 2025-05-07T19:45:07.8550904Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:45:07.8551214Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:45:07.8551503Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:45:07.8551807Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:07.8552216Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:45:07.8552517Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:45:07.8552782Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:45:07.8553228Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:45:07.8553539Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:45:07.8553853Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:45:07.8554139Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:45:07.8554497Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:45:07.8554807Z #define __DEPRECATED 1 2025-05-07T19:45:07.8555076Z #define __ELF__ 1 2025-05-07T19:45:07.8555330Z #define __EXCEPTIONS 1 2025-05-07T19:45:07.8555580Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:45:07.8555882Z #define __FLOAT128__ 1 2025-05-07T19:45:07.8556132Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:45:07.8556464Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:45:07.8556790Z #define __FLT16_DIG__ 3 2025-05-07T19:45:07.8557072Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:45:07.8557387Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:45:07.8557678Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:45:07.8557958Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:45:07.8558259Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:45:07.8558549Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:45:07.8558815Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:45:07.8559102Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:45:07.8559374Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:45:07.8559670Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:45:07.8559939Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:45:07.8560254Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:45:07.8560533Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:45:07.8560848Z #define __FLT_DIG__ 6 2025-05-07T19:45:07.8561095Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:45:07.8561403Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:45:07.8561687Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:45:07.8561964Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:45:07.8562254Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:45:07.8562511Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:45:07.8562916Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:45:07.8563362Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:45:07.8563691Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:45:07.8564035Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:45:07.8564356Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:45:07.8564657Z #define __FLT_RADIX__ 2 2025-05-07T19:45:07.8564944Z #define __FXSR__ 1 2025-05-07T19:45:07.8565223Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:45:07.8565533Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:07.8565880Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:07.8566214Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:07.8566570Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:07.8566886Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:07.8567508Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:07.8567990Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:07.8568342Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:07.8568700Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:07.8569034Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:45:07.8569406Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:07.8569741Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:45:07.8570096Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:45:07.8570448Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:45:07.8570820Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:45:07.8571169Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:45:07.8571529Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:45:07.8571845Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:45:07.8572181Z #define __GNUC_GNU_INLINE__ 1 2025-05-07T19:45:07.8572491Z #define __GNUC_MINOR__ 2 2025-05-07T19:45:07.8572762Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:45:07.8573149Z #define __GNUC__ 4 2025-05-07T19:45:07.8573384Z #define __GNUG__ 4 2025-05-07T19:45:07.8573666Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:45:07.8573965Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:45:07.8574298Z #define __GXX_RTTI 1 2025-05-07T19:45:07.8574549Z #define __GXX_WEAK__ 1 2025-05-07T19:45:07.8574833Z #define __INT16_C_SUFFIX__ 2025-05-07T19:45:07.8575106Z #define __INT16_FMTd__ "hd" 2025-05-07T19:45:07.8575408Z #define __INT16_FMTi__ "hi" 2025-05-07T19:45:07.8575699Z #define __INT16_MAX__ 32767 2025-05-07T19:45:07.8575967Z #define __INT16_TYPE__ short 2025-05-07T19:45:07.8576264Z #define __INT32_C_SUFFIX__ 2025-05-07T19:45:07.8576531Z #define __INT32_FMTd__ "d" 2025-05-07T19:45:07.8576820Z #define __INT32_FMTi__ "i" 2025-05-07T19:45:07.8577088Z #define __INT32_MAX__ 2147483647 2025-05-07T19:45:07.8577390Z #define __INT32_TYPE__ int 2025-05-07T19:45:07.8577656Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:45:07.8577951Z #define __INT64_FMTd__ "ld" 2025-05-07T19:45:07.8578229Z #define __INT64_FMTi__ "li" 2025-05-07T19:45:07.8578536Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8578880Z #define __INT64_TYPE__ long int 2025-05-07T19:45:07.8579166Z #define __INT8_C_SUFFIX__ 2025-05-07T19:45:07.8579459Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:45:07.8579839Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:45:07.8580112Z #define __INT8_MAX__ 127 2025-05-07T19:45:07.8580369Z #define __INT8_TYPE__ signed char 2025-05-07T19:45:07.8580676Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:45:07.8580945Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:45:07.8581233Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:45:07.8581510Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8581830Z #define __INTMAX_TYPE__ long int 2025-05-07T19:45:07.8582124Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:45:07.8582382Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:45:07.8582666Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:45:07.8582941Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8583305Z #define __INTPTR_TYPE__ long int 2025-05-07T19:45:07.8583581Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:45:07.8583873Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:45:07.8584149Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:45:07.8584456Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:45:07.8584736Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:45:07.8585057Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:45:07.8585365Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:45:07.8585641Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:45:07.8585941Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:45:07.8586238Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:45:07.8586543Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:45:07.8586823Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:45:07.8587131Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:45:07.8587434Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8587793Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:45:07.8588163Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:45:07.8588475Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:45:07.8588783Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:45:07.8589064Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:45:07.8589369Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:45:07.8589669Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:45:07.8589974Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:45:07.8590255Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:45:07.8590550Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:45:07.8590824Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:45:07.8591127Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:45:07.8591397Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:45:07.8591693Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:45:07.8591994Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:45:07.8592283Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:45:07.8592581Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:45:07.8592848Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:45:07.8593207Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:45:07.8593507Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8593856Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:45:07.8594148Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:45:07.8594446Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:45:07.8594723Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:45:07.8595030Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:45:07.8595340Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:45:07.8595644Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:45:07.8595940Z #define __INT_MAX__ 2147483647 2025-05-07T19:45:07.8596206Z #define __INT_WIDTH__ 32 2025-05-07T19:45:07.8596482Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:45:07.8596796Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:45:07.8597162Z #define __LDBL_DIG__ 18 2025-05-07T19:45:07.8597439Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:45:07.8597799Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:45:07.8598096Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:45:07.8598366Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:07.8598667Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:45:07.8598932Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:45:07.8599234Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:45:07.8599524Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:45:07.8599876Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:45:07.8600163Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:45:07.8600482Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:45:07.8600799Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:45:07.8601080Z #define __LLONG_WIDTH__ 64 2025-05-07T19:45:07.8601382Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:45:07.8601697Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8602012Z #define __LONG_WIDTH__ 64 2025-05-07T19:45:07.8602252Z #define __LP64__ 1 2025-05-07T19:45:07.8602493Z #define __MMX__ 1 2025-05-07T19:45:07.8602809Z #define __NO_INLINE__ 1 2025-05-07T19:45:07.8603255Z #define __NO_MATH_INLINES 1 2025-05-07T19:45:07.8603534Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:45:07.8603945Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:45:07.8604307Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:45:07.8604670Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:45:07.8605038Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:45:07.8605394Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:45:07.8605745Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:45:07.8606047Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:45:07.8606379Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:45:07.8606659Z #define __PIC__ 2 2025-05-07T19:45:07.8606914Z #define __PIE__ 2 2025-05-07T19:45:07.8607154Z #define __POINTER_WIDTH__ 64 2025-05-07T19:45:07.8607465Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:45:07.8607771Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:45:07.8608157Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:45:07.8608487Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:45:07.8608813Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:45:07.8609141Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:45:07.8609420Z #define __REGISTER_PREFIX__ 2025-05-07T19:45:07.8609721Z #define __SCHAR_MAX__ 127 2025-05-07T19:45:07.8609977Z #define __SEG_FS 1 2025-05-07T19:45:07.8610236Z #define __SEG_GS 1 2025-05-07T19:45:07.8610474Z #define __SHRT_MAX__ 32767 2025-05-07T19:45:07.8610774Z #define __SHRT_WIDTH__ 16 2025-05-07T19:45:07.8611050Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:45:07.8611382Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:45:07.8611691Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:45:07.8611970Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:45:07.8612274Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:45:07.8612546Z #define __SIZEOF_INT128__ 16 2025-05-07T19:45:07.8612847Z #define __SIZEOF_INT__ 4 2025-05-07T19:45:07.8613116Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:45:07.8613522Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:45:07.8613812Z #define __SIZEOF_LONG__ 8 2025-05-07T19:45:07.8614117Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:45:07.8614412Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:45:07.8614736Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:45:07.8615010Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:45:07.8615428Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:45:07.8615724Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:45:07.8615986Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:45:07.8616275Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:45:07.8616529Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:45:07.8616814Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:45:07.8617085Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:45:07.8617430Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:45:07.8617730Z #define __SIZE_WIDTH__ 64 2025-05-07T19:45:07.8618014Z #define __SSE2_MATH__ 1 2025-05-07T19:45:07.8618260Z #define __SSE2__ 1 2025-05-07T19:45:07.8618528Z #define __SSE_MATH__ 1 2025-05-07T19:45:07.8618784Z #define __SSE__ 1 2025-05-07T19:45:07.8619037Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16UL 2025-05-07T19:45:07.8619380Z #define __STDCPP_THREADS__ 1 2025-05-07T19:45:07.8619644Z #define __STDC_HOSTED__ 1 2025-05-07T19:45:07.8619919Z #define __STDC_UTF_16__ 1 2025-05-07T19:45:07.8620163Z #define __STDC_UTF_32__ 1 2025-05-07T19:45:07.8620428Z #define __STDC__ 1 2025-05-07T19:45:07.8620653Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:45:07.8620936Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:45:07.8621191Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:45:07.8621471Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:45:07.8621748Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:45:07.8622000Z #define __UINT16_MAX__ 65535 2025-05-07T19:45:07.8622286Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:45:07.8622575Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:45:07.8622859Z #define __UINT32_FMTX__ "X" 2025-05-07T19:45:07.8623110Z #define __UINT32_FMTo__ "o" 2025-05-07T19:45:07.8623392Z #define __UINT32_FMTu__ "u" 2025-05-07T19:45:07.8623644Z #define __UINT32_FMTx__ "x" 2025-05-07T19:45:07.8623926Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:45:07.8624209Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:45:07.8624520Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:45:07.8624812Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:45:07.8625076Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:45:07.8625370Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:45:07.8625633Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:45:07.8625934Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:45:07.8626254Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:45:07.8626577Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:45:07.8626837Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:45:07.8627118Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:45:07.8627373Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:45:07.8627660Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:45:07.8627940Z #define __UINT8_MAX__ 255 2025-05-07T19:45:07.8628303Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:45:07.8628628Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:45:07.8628910Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:45:07.8629219Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:45:07.8629494Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:45:07.8629793Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:45:07.8630083Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:45:07.8630437Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:45:07.8630742Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:45:07.8631039Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:45:07.8631328Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:45:07.8631587Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:45:07.8631872Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:45:07.8632152Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:45:07.8632503Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:45:07.8632809Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:45:07.8633156Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:45:07.8633441Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:45:07.8633744Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:45:07.8634020Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:45:07.8634322Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:45:07.8634641Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:45:07.8634952Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:45:07.8635257Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:45:07.8635531Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:45:07.8635819Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:45:07.8636093Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:45:07.8636416Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:45:07.8636721Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:45:07.8637018Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:45:07.8637289Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:45:07.8637585Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:45:07.8637911Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:07.8638253Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:45:07.8638590Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:45:07.8638867Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:45:07.8639163Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:45:07.8639435Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:45:07.8639740Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:45:07.8640019Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:45:07.8640354Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:45:07.8640676Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:45:07.8640958Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:45:07.8641258Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:45:07.8641535Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:45:07.8641858Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:45:07.8642169Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:45:07.8642480Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:45:07.8642845Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:45:07.8643333Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:45:07.8643653Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:45:07.8644027Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:45:07.8644390Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:45:07.8644698Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:45:07.8645032Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:45:07.8645335Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:45:07.8645691Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:07.8646076Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:45:07.8646458Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:45:07.8646764Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:45:07.8647098Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:45:07.8647432Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:45:07.8647739Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:45:07.8648152Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:45:07.8648492Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:45:07.8649180Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:07.8649842Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:45:07.8650172Z #define __WCHAR_TYPE__ int 2025-05-07T19:45:07.8650452Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:45:07.8650763Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:45:07.8651080Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:45:07.8651388Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:45:07.8651688Z #define __WINT_WIDTH__ 32 2025-05-07T19:45:07.8651943Z #define __amd64 1 2025-05-07T19:45:07.8652196Z #define __amd64__ 1 2025-05-07T19:45:07.8652426Z #define __clang__ 1 2025-05-07T19:45:07.8652710Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:45:07.8653025Z #define __clang_major__ 16 2025-05-07T19:45:07.8653284Z #define __clang_minor__ 0 2025-05-07T19:45:07.8656531Z #define __clang_patchlevel__ 6 2025-05-07T19:45:07.8657271Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:07.8657935Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:45:07.8658261Z #define __code_model_small__ 1 2025-05-07T19:45:07.8658548Z #define __cplusplus 201703L 2025-05-07T19:45:07.8658820Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:45:07.8659144Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:45:07.8659443Z #define __cpp_alias_templates 200704L 2025-05-07T19:45:07.8659758Z #define __cpp_aligned_new 201606L 2025-05-07T19:45:07.8660041Z #define __cpp_attributes 200809L 2025-05-07T19:45:07.8660356Z #define __cpp_binary_literals 201304L 2025-05-07T19:45:07.8660659Z #define __cpp_capture_star_this 201603L 2025-05-07T19:45:07.8660990Z #define __cpp_constexpr 201603L 2025-05-07T19:45:07.8661448Z #define __cpp_constexpr_in_decltype 201711L 2025-05-07T19:45:07.8661785Z #define __cpp_decltype 200707L 2025-05-07T19:45:07.8662059Z #define __cpp_decltype_auto 201304L 2025-05-07T19:45:07.8662335Z #define __cpp_deduction_guides 201703L 2025-05-07T19:45:07.8662672Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:45:07.8662997Z #define __cpp_digit_separators 201309L 2025-05-07T19:45:07.8663329Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:45:07.8663640Z #define __cpp_exceptions 199711L 2025-05-07T19:45:07.8663954Z #define __cpp_fold_expressions 201603L 2025-05-07T19:45:07.8664279Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:45:07.8664597Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:45:07.8664938Z #define __cpp_hex_float 201603L 2025-05-07T19:45:07.8665212Z #define __cpp_if_constexpr 201606L 2025-05-07T19:45:07.8665539Z #define __cpp_impl_destroying_delete 201806L 2025-05-07T19:45:07.8665875Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:45:07.8666219Z #define __cpp_init_captures 201304L 2025-05-07T19:45:07.8666524Z #define __cpp_initializer_lists 200806L 2025-05-07T19:45:07.8666855Z #define __cpp_inline_variables 201606L 2025-05-07T19:45:07.8667284Z #define __cpp_lambdas 200907L 2025-05-07T19:45:07.8667785Z #define __cpp_named_character_escapes 202207L 2025-05-07T19:45:07.8668168Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:45:07.8668540Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:45:07.8668955Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:45:07.8669305Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:45:07.8669707Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:45:07.8670076Z #define __cpp_nsdmi 200809L 2025-05-07T19:45:07.8670388Z #define __cpp_range_based_for 201603L 2025-05-07T19:45:07.8670701Z #define __cpp_raw_strings 200710L 2025-05-07T19:45:07.8671034Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:45:07.8671384Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:45:07.8671710Z #define __cpp_rtti 199711L 2025-05-07T19:45:07.8672146Z #define __cpp_rvalue_references 200610L 2025-05-07T19:45:07.8672468Z #define __cpp_static_assert 201411L 2025-05-07T19:45:07.8672815Z #define __cpp_static_call_operator 202207L 2025-05-07T19:45:07.8673153Z #define __cpp_structured_bindings 201606L 2025-05-07T19:45:07.8673509Z #define __cpp_template_auto 201606L 2025-05-07T19:45:07.8673834Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:45:07.8674203Z #define __cpp_unicode_characters 200704L 2025-05-07T19:45:07.8674557Z #define __cpp_unicode_literals 200710L 2025-05-07T19:45:07.8674885Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:45:07.8675248Z #define __cpp_variable_templates 201304L 2025-05-07T19:45:07.8675583Z #define __cpp_variadic_templates 200704L 2025-05-07T19:45:07.8675940Z #define __cpp_variadic_using 201611L 2025-05-07T19:45:07.8676238Z #define __gnu_linux__ 1 2025-05-07T19:45:07.8676516Z #define __k8 1 2025-05-07T19:45:07.8676740Z #define __k8__ 1 2025-05-07T19:45:07.8676993Z #define __linux 1 2025-05-07T19:45:07.8677307Z #define __linux__ 1 2025-05-07T19:45:07.8677576Z #define __llvm__ 1 2025-05-07T19:45:07.8677810Z #define __pic__ 2 2025-05-07T19:45:07.8678067Z #define __pie__ 2 2025-05-07T19:45:07.8678338Z #define __private_extern__ extern 2025-05-07T19:45:07.8678660Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:45:07.8679052Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:45:07.8679379Z #define __tune_k8__ 1 2025-05-07T19:45:07.8679622Z #define __unix 1 2025-05-07T19:45:07.8679938Z #define __unix__ 1 2025-05-07T19:45:07.8680327Z #define __x86_64 1 2025-05-07T19:45:07.8680523Z #define __x86_64__ 1 2025-05-07T19:45:07.8680739Z #define linux 1 2025-05-07T19:45:07.8680931Z #define unix 1 2025-05-07T19:45:07.8681064Z 2025-05-07T19:45:07.9282750Z 2025-05-07T19:45:07.9283117Z + conda run -n build_binary c++ --version 2025-05-07T19:45:07.9283411Z 2025-05-07T19:45:09.7407461Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:45:09.7408237Z Target: x86_64-conda-linux-gnu 2025-05-07T19:45:09.7408538Z Thread model: posix 2025-05-07T19:45:09.7408855Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:45:09.7409504Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:45:09.7409967Z 2025-05-07T19:45:09.7986678Z 2025-05-07T19:45:09.7987471Z [INFO] Printing the default version of the C standard used by the compiler ... 2025-05-07T19:45:09.7988194Z + conda run -n build_binary cc -dM -E - < /dev/null | grep __STDC_VERSION__ 2025-05-07T19:45:09.7991325Z 2025-05-07T19:45:11.6759475Z #define __STDC_VERSION__ 201710L 2025-05-07T19:45:11.6761417Z 2025-05-07T19:45:11.6762027Z [INFO] Printing the default version of the C++ standard used by the compiler ... 2025-05-07T19:45:11.6762781Z + conda run -n build_binary c++ -dM -E -x c++ - < /dev/null | grep __cplusplus 2025-05-07T19:45:11.6763144Z 2025-05-07T19:45:13.5763188Z #define __cplusplus 201703L 2025-05-07T19:45:13.5764193Z 2025-05-07T19:45:13.5764904Z [INSTALL] Successfully installed C/C++ compilers 2025-05-07T19:45:13.5864927Z ##[group]Run . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:45:13.5865418Z . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:45:13.5866315Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:13.5866645Z env: 2025-05-07T19:45:13.5866911Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:13.5867637Z BUILD_ENV: build_binary 2025-05-07T19:45:13.5867951Z BUILD_TARGET: genai 2025-05-07T19:45:13.5868307Z BUILD_VARIANT: cuda 2025-05-07T19:45:13.5868596Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:13.5868868Z ##[endgroup] 2025-05-07T19:45:14.0365209Z ################################################################################ 2025-05-07T19:45:14.0366205Z # Install Build Tools 2025-05-07T19:45:14.0366852Z # 2025-05-07T19:45:14.0378202Z # [2025-05-07T19:45:14.037Z] + install_build_tools build_binary 2025-05-07T19:45:14.0379850Z ################################################################################ 2025-05-07T19:45:14.0380634Z 2025-05-07T19:45:14.0396791Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:14.1232075Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:14.1244841Z [INSTALL] Installing build tools ... 2025-05-07T19:45:14.1269647Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y auditwheel bazel cmake>=3.30 hypothesis jinja2 make ncurses ninja openblas patchelf rhash scikit-build wheel pyyaml 2025-05-07T19:45:14.8344060Z Channels: 2025-05-07T19:45:14.8344463Z - conda-forge 2025-05-07T19:45:14.8344713Z Platform: linux-64 2025-05-07T19:45:17.9327958Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:45:21.6190800Z Solving environment: \ | / - done 2025-05-07T19:45:21.6766555Z 2025-05-07T19:45:21.6767614Z ## Package Plan ## 2025-05-07T19:45:21.6767855Z 2025-05-07T19:45:21.6768194Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:21.6768552Z 2025-05-07T19:45:21.6768714Z added / updated specs: 2025-05-07T19:45:21.6769030Z - auditwheel 2025-05-07T19:45:21.6769264Z - bazel 2025-05-07T19:45:21.6769523Z - cmake[version='>=3.30'] 2025-05-07T19:45:21.6769799Z - hypothesis 2025-05-07T19:45:21.6770059Z - jinja2 2025-05-07T19:45:21.6770276Z - make 2025-05-07T19:45:21.6770517Z - ncurses 2025-05-07T19:45:21.6770767Z - ninja 2025-05-07T19:45:21.6770983Z - openblas 2025-05-07T19:45:21.6771238Z - patchelf 2025-05-07T19:45:21.6771461Z - pyyaml 2025-05-07T19:45:21.6771701Z - rhash 2025-05-07T19:45:21.6771921Z - scikit-build 2025-05-07T19:45:21.6772184Z - wheel 2025-05-07T19:45:21.6772307Z 2025-05-07T19:45:21.6772312Z 2025-05-07T19:45:21.6772443Z The following packages will be downloaded: 2025-05-07T19:45:21.6772708Z 2025-05-07T19:45:21.6772839Z package | build 2025-05-07T19:45:21.6773243Z ---------------------------|----------------- 2025-05-07T19:45:21.6773666Z alsa-lib-1.2.14 | hb9d3cd8_0 553 KB conda-forge 2025-05-07T19:45:21.6774161Z attrs-25.3.0 | pyh71513ae_0 56 KB conda-forge 2025-05-07T19:45:21.6774623Z auditwheel-6.2.0 | pyha804496_1 40 KB conda-forge 2025-05-07T19:45:21.6775103Z bazel-7.5.0 | h96810dc_2 47.4 MB conda-forge 2025-05-07T19:45:21.6775534Z c-ares-1.34.5 | hb9d3cd8_0 202 KB conda-forge 2025-05-07T19:45:21.6775994Z cairo-1.18.0 | hbb29018_2 961 KB conda-forge 2025-05-07T19:45:21.6776453Z click-8.1.8 | pyh707e725_0 83 KB conda-forge 2025-05-07T19:45:21.6776882Z cmake-4.0.2 | h74e3db0_0 19.4 MB conda-forge 2025-05-07T19:45:21.6777427Z distro-1.9.0 | pyhd8ed1ab_1 41 KB conda-forge 2025-05-07T19:45:21.6778307Z exceptiongroup-1.2.2 | pyhd8ed1ab_1 20 KB conda-forge 2025-05-07T19:45:21.6778982Z font-ttf-dejavu-sans-mono-2.37| hab24e00_0 388 KB conda-forge 2025-05-07T19:45:21.6779569Z font-ttf-inconsolata-3.000 | h77eed37_0 94 KB conda-forge 2025-05-07T19:45:21.6780152Z font-ttf-source-code-pro-2.038| h77eed37_0 684 KB conda-forge 2025-05-07T19:45:21.6780791Z font-ttf-ubuntu-0.83 | h77eed37_3 1.5 MB conda-forge 2025-05-07T19:45:21.6781276Z fontconfig-2.15.0 | h7e30c49_1 259 KB conda-forge 2025-05-07T19:45:21.6781750Z fonts-conda-ecosystem-1 | 0 4 KB conda-forge 2025-05-07T19:45:21.6782263Z fonts-conda-forge-1 | 0 4 KB conda-forge 2025-05-07T19:45:21.6782713Z freetype-2.13.3 | ha770c72_1 168 KB conda-forge 2025-05-07T19:45:21.6783168Z giflib-5.2.2 | hd590300_0 75 KB conda-forge 2025-05-07T19:45:21.6783769Z graphite2-1.3.13 | h59595ed_1003 95 KB conda-forge 2025-05-07T19:45:21.6784203Z harfbuzz-9.0.0 | hfac3d4d_0 1.5 MB conda-forge 2025-05-07T19:45:21.6784672Z hypothesis-6.131.14 | pyha770c72_0 348 KB conda-forge 2025-05-07T19:45:21.6785157Z ijar-7.5.0 | h5888daf_0 114 KB conda-forge 2025-05-07T19:45:21.6785560Z jinja2-3.1.6 | pyhd8ed1ab_0 110 KB conda-forge 2025-05-07T19:45:21.6786033Z keyutils-1.6.1 | h166bdaf_0 115 KB conda-forge 2025-05-07T19:45:21.6786483Z krb5-1.21.3 | h659f571_0 1.3 MB conda-forge 2025-05-07T19:45:21.6786886Z lcms2-2.17 | h717163a_0 242 KB conda-forge 2025-05-07T19:45:21.6787318Z lerc-4.0.0 | h0aef613_1 258 KB conda-forge 2025-05-07T19:45:21.6787771Z libabseil-20250127.1 | cxx17_hbbce691_0 1.3 MB conda-forge 2025-05-07T19:45:21.6788276Z libcups-2.3.3 | h4637d8d_4 4.3 MB conda-forge 2025-05-07T19:45:21.6788704Z libcurl-8.13.0 | h332b0f4_0 428 KB conda-forge 2025-05-07T19:45:21.6789174Z libdeflate-1.23 | h86f0d12_0 71 KB conda-forge 2025-05-07T19:45:21.6789677Z libedit-3.1.20250104 | pl5321h7949ede_0 132 KB conda-forge 2025-05-07T19:45:21.6790124Z libev-4.33 | hd590300_2 110 KB conda-forge 2025-05-07T19:45:21.6790577Z libexpat-2.7.0 | h5888daf_0 73 KB conda-forge 2025-05-07T19:45:21.6791025Z libfreetype-2.13.3 | ha770c72_1 8 KB conda-forge 2025-05-07T19:45:21.6791524Z libfreetype6-2.13.3 | h48d6fc4_1 371 KB conda-forge 2025-05-07T19:45:21.6792022Z libgfortran-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:45:21.6792497Z libgfortran5-15.1.0 | hcea5267_2 1.5 MB conda-forge 2025-05-07T19:45:21.6792977Z libglib-2.84.0 | h2ff4ddf_0 3.8 MB conda-forge 2025-05-07T19:45:21.6793398Z libgrpc-1.71.0 | h8e591d7_1 7.6 MB conda-forge 2025-05-07T19:45:21.6793877Z libjpeg-turbo-3.1.0 | hb9d3cd8_0 614 KB conda-forge 2025-05-07T19:45:21.6794322Z liblzma-5.8.1 | hb9d3cd8_1 110 KB conda-forge 2025-05-07T19:45:21.6794805Z liblzma-devel-5.8.1 | hb9d3cd8_1 431 KB conda-forge 2025-05-07T19:45:21.6795285Z libnghttp2-1.64.0 | h161d5f1_0 632 KB conda-forge 2025-05-07T19:45:21.6795714Z libnsl-2.0.1 | hd590300_0 33 KB conda-forge 2025-05-07T19:45:21.6796188Z libopenblas-0.3.29 |pthreads_h94d23a6_0 5.6 MB conda-forge 2025-05-07T19:45:21.6796644Z libpng-1.6.47 | h943b412_0 282 KB conda-forge 2025-05-07T19:45:21.6797204Z libprotobuf-5.29.3 | h501fc15_1 3.2 MB conda-forge 2025-05-07T19:45:21.6797632Z libre2-11-2024.07.02 | hba17884_3 205 KB conda-forge 2025-05-07T19:45:21.6798058Z libsqlite-3.49.2 | hee588c1_0 895 KB conda-forge 2025-05-07T19:45:21.6798479Z libssh2-1.11.1 | hcf80075_0 298 KB conda-forge 2025-05-07T19:45:21.6798893Z libtiff-4.7.0 | hd9ff511_4 419 KB conda-forge 2025-05-07T19:45:21.6799337Z libuuid-2.38.1 | h0b41bf4_0 33 KB conda-forge 2025-05-07T19:45:21.6799744Z libuv-1.50.0 | hb9d3cd8_0 870 KB conda-forge 2025-05-07T19:45:21.6800194Z libwebp-base-1.5.0 | h851e524_0 420 KB conda-forge 2025-05-07T19:45:21.6800649Z libxcb-1.17.0 | h8a09558_0 387 KB conda-forge 2025-05-07T19:45:21.6801138Z libzlib-1.3.1 | hb9d3cd8_2 60 KB conda-forge 2025-05-07T19:45:21.6801574Z make-4.4.1 | hb9d3cd8_2 501 KB conda-forge 2025-05-07T19:45:21.6802003Z markupsafe-3.0.2 | py311h2dc5d0c_1 25 KB conda-forge 2025-05-07T19:45:21.6802472Z ncurses-6.5 | h2d0b736_3 871 KB conda-forge 2025-05-07T19:45:21.6802960Z ninja-1.12.1 | hff21bea_1 158 KB conda-forge 2025-05-07T19:45:21.6803634Z openblas-0.3.29 |pthreads_h6ec200e_0 5.8 MB conda-forge 2025-05-07T19:45:21.6804151Z openjdk-23.0.1 | h4c11d01_0 181.3 MB conda-forge 2025-05-07T19:45:21.6804621Z packaging-25.0 | pyh29332c3_1 61 KB conda-forge 2025-05-07T19:45:21.6805102Z patchelf-0.18.0 | h3f2d84a_2 133 KB conda-forge 2025-05-07T19:45:21.6805526Z pcre2-10.44 | hc749103_2 934 KB conda-forge 2025-05-07T19:45:21.6805977Z pixman-0.46.0 | h29eaf8c_0 389 KB conda-forge 2025-05-07T19:45:21.6806456Z pthread-stubs-0.4 | hb9d3cd8_1002 8 KB conda-forge 2025-05-07T19:45:21.6806974Z pyelftools-0.32 | pyh707e725_1 146 KB conda-forge 2025-05-07T19:45:21.6807484Z python-3.11.11 |h9e4cc4f_2_cpython 29.2 MB conda-forge 2025-05-07T19:45:21.6807952Z pyyaml-6.0.2 | py311h2dc5d0c_2 208 KB conda-forge 2025-05-07T19:45:21.6808428Z re2-2024.07.02 | h9925aae_3 26 KB conda-forge 2025-05-07T19:45:21.6808863Z rhash-1.4.5 | hb9d3cd8_0 183 KB conda-forge 2025-05-07T19:45:21.6809359Z scikit-build-0.18.1 | pyhae55e72_2 114 KB conda-forge 2025-05-07T19:45:21.6809872Z singlejar-7.5.0 | h0e684df_1 122 KB conda-forge 2025-05-07T19:45:21.6810385Z sortedcontainers-2.4.0 | pyhd8ed1ab_1 28 KB conda-forge 2025-05-07T19:45:21.6810901Z sqlite-3.49.2 | h9eae976_0 840 KB conda-forge 2025-05-07T19:45:21.6811334Z tk-8.6.13 |noxft_h4845f30_101 3.2 MB conda-forge 2025-05-07T19:45:21.6811799Z tomli-2.2.1 | pyhd8ed1ab_1 19 KB conda-forge 2025-05-07T19:45:21.6812239Z wheel-0.45.1 | pyhd8ed1ab_1 61 KB conda-forge 2025-05-07T19:45:21.6812726Z xorg-libice-1.1.2 | hb9d3cd8_0 57 KB conda-forge 2025-05-07T19:45:21.6813219Z xorg-libsm-1.2.6 | he73a12e_0 27 KB conda-forge 2025-05-07T19:45:21.6813693Z xorg-libx11-1.8.12 | h4f16b4b_0 816 KB conda-forge 2025-05-07T19:45:21.6814193Z xorg-libxau-1.0.12 | hb9d3cd8_0 14 KB conda-forge 2025-05-07T19:45:21.6814684Z xorg-libxdmcp-1.1.5 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:21.6815382Z xorg-libxext-1.3.6 | hb9d3cd8_0 49 KB conda-forge 2025-05-07T19:45:21.6815869Z xorg-libxfixes-6.0.1 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:21.6816321Z xorg-libxi-1.8.2 | hb9d3cd8_0 46 KB conda-forge 2025-05-07T19:45:21.6816805Z xorg-libxrandr-1.5.4 | hb9d3cd8_0 29 KB conda-forge 2025-05-07T19:45:21.6817279Z xorg-libxrender-0.9.12 | hb9d3cd8_0 32 KB conda-forge 2025-05-07T19:45:21.6817778Z xorg-libxt-1.3.1 | hb9d3cd8_0 371 KB conda-forge 2025-05-07T19:45:21.6818222Z xorg-libxtst-1.2.5 | hb9d3cd8_3 32 KB conda-forge 2025-05-07T19:45:21.6818663Z xz-5.8.1 | hbcc6ac9_1 23 KB conda-forge 2025-05-07T19:45:21.6819103Z xz-gpl-tools-5.8.1 | hbcc6ac9_1 33 KB conda-forge 2025-05-07T19:45:21.6819605Z xz-tools-5.8.1 | hb9d3cd8_1 94 KB conda-forge 2025-05-07T19:45:21.6820031Z yaml-0.2.5 | h7f98852_2 87 KB conda-forge 2025-05-07T19:45:21.6820420Z zlib-1.3.1 | hb9d3cd8_2 90 KB conda-forge 2025-05-07T19:45:21.6820837Z zstd-1.5.7 | hb8e6e7a_2 554 KB conda-forge 2025-05-07T19:45:21.6821227Z ------------------------------------------------------------ 2025-05-07T19:45:21.6821609Z Total: 336.5 MB 2025-05-07T19:45:21.6821824Z 2025-05-07T19:45:21.6821987Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:21.6822216Z 2025-05-07T19:45:21.6822427Z alsa-lib conda-forge/linux-64::alsa-lib-1.2.14-hb9d3cd8_0 2025-05-07T19:45:21.6822891Z attrs conda-forge/noarch::attrs-25.3.0-pyh71513ae_0 2025-05-07T19:45:21.6823353Z auditwheel conda-forge/noarch::auditwheel-6.2.0-pyha804496_1 2025-05-07T19:45:21.6823845Z bazel conda-forge/linux-64::bazel-7.5.0-h96810dc_2 2025-05-07T19:45:21.6824293Z c-ares conda-forge/linux-64::c-ares-1.34.5-hb9d3cd8_0 2025-05-07T19:45:21.6824714Z cairo conda-forge/linux-64::cairo-1.18.0-hbb29018_2 2025-05-07T19:45:21.6825162Z click conda-forge/noarch::click-8.1.8-pyh707e725_0 2025-05-07T19:45:21.6825578Z cmake conda-forge/linux-64::cmake-4.0.2-h74e3db0_0 2025-05-07T19:45:21.6826035Z distro conda-forge/noarch::distro-1.9.0-pyhd8ed1ab_1 2025-05-07T19:45:21.6826575Z exceptiongroup conda-forge/noarch::exceptiongroup-1.2.2-pyhd8ed1ab_1 2025-05-07T19:45:21.6827180Z font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0 2025-05-07T19:45:21.6827831Z font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0 2025-05-07T19:45:21.6828428Z font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0 2025-05-07T19:45:21.6829000Z font-ttf-ubuntu conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3 2025-05-07T19:45:21.6829501Z fontconfig conda-forge/linux-64::fontconfig-2.15.0-h7e30c49_1 2025-05-07T19:45:21.6829979Z fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0 2025-05-07T19:45:21.6830465Z fonts-conda-forge conda-forge/noarch::fonts-conda-forge-1-0 2025-05-07T19:45:21.6830912Z freetype conda-forge/linux-64::freetype-2.13.3-ha770c72_1 2025-05-07T19:45:21.6831344Z giflib conda-forge/linux-64::giflib-5.2.2-hd590300_0 2025-05-07T19:45:21.6831792Z graphite2 conda-forge/linux-64::graphite2-1.3.13-h59595ed_1003 2025-05-07T19:45:21.6832240Z harfbuzz conda-forge/linux-64::harfbuzz-9.0.0-hfac3d4d_0 2025-05-07T19:45:21.6832714Z hypothesis conda-forge/noarch::hypothesis-6.131.14-pyha770c72_0 2025-05-07T19:45:21.6833147Z ijar conda-forge/linux-64::ijar-7.5.0-h5888daf_0 2025-05-07T19:45:21.6833554Z jinja2 conda-forge/noarch::jinja2-3.1.6-pyhd8ed1ab_0 2025-05-07T19:45:21.6834049Z keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 2025-05-07T19:45:21.6834456Z krb5 conda-forge/linux-64::krb5-1.21.3-h659f571_0 2025-05-07T19:45:21.6834864Z lcms2 conda-forge/linux-64::lcms2-2.17-h717163a_0 2025-05-07T19:45:21.6835241Z lerc conda-forge/linux-64::lerc-4.0.0-h0aef613_1 2025-05-07T19:45:21.6835703Z libabseil conda-forge/linux-64::libabseil-20250127.1-cxx17_hbbce691_0 2025-05-07T19:45:21.6836176Z libcups conda-forge/linux-64::libcups-2.3.3-h4637d8d_4 2025-05-07T19:45:21.6836612Z libcurl conda-forge/linux-64::libcurl-8.13.0-h332b0f4_0 2025-05-07T19:45:21.6837069Z libdeflate conda-forge/linux-64::libdeflate-1.23-h86f0d12_0 2025-05-07T19:45:21.6837543Z libedit conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0 2025-05-07T19:45:21.6838092Z libev conda-forge/linux-64::libev-4.33-hd590300_2 2025-05-07T19:45:21.6838502Z libexpat conda-forge/linux-64::libexpat-2.7.0-h5888daf_0 2025-05-07T19:45:21.6838978Z libfreetype conda-forge/linux-64::libfreetype-2.13.3-ha770c72_1 2025-05-07T19:45:21.6839482Z libfreetype6 conda-forge/linux-64::libfreetype6-2.13.3-h48d6fc4_1 2025-05-07T19:45:21.6839969Z libgfortran conda-forge/linux-64::libgfortran-15.1.0-h69a702a_2 2025-05-07T19:45:21.6840470Z libgfortran5 conda-forge/linux-64::libgfortran5-15.1.0-hcea5267_2 2025-05-07T19:45:21.6840929Z libglib conda-forge/linux-64::libglib-2.84.0-h2ff4ddf_0 2025-05-07T19:45:21.6841364Z libgrpc conda-forge/linux-64::libgrpc-1.71.0-h8e591d7_1 2025-05-07T19:45:21.6841840Z libjpeg-turbo conda-forge/linux-64::libjpeg-turbo-3.1.0-hb9d3cd8_0 2025-05-07T19:45:21.6842301Z liblzma conda-forge/linux-64::liblzma-5.8.1-hb9d3cd8_1 2025-05-07T19:45:21.6842845Z liblzma-devel conda-forge/linux-64::liblzma-devel-5.8.1-hb9d3cd8_1 2025-05-07T19:45:21.6843523Z libnghttp2 conda-forge/linux-64::libnghttp2-1.64.0-h161d5f1_0 2025-05-07T19:45:21.6844001Z libnsl conda-forge/linux-64::libnsl-2.0.1-hd590300_0 2025-05-07T19:45:21.6844526Z libopenblas conda-forge/linux-64::libopenblas-0.3.29-pthreads_h94d23a6_0 2025-05-07T19:45:21.6845043Z libpng conda-forge/linux-64::libpng-1.6.47-h943b412_0 2025-05-07T19:45:21.6845533Z libprotobuf conda-forge/linux-64::libprotobuf-5.29.3-h501fc15_1 2025-05-07T19:45:21.6846031Z libre2-11 conda-forge/linux-64::libre2-11-2024.07.02-hba17884_3 2025-05-07T19:45:21.6846536Z libsqlite conda-forge/linux-64::libsqlite-3.49.2-hee588c1_0 2025-05-07T19:45:21.6847020Z libssh2 conda-forge/linux-64::libssh2-1.11.1-hcf80075_0 2025-05-07T19:45:21.6847465Z libtiff conda-forge/linux-64::libtiff-4.7.0-hd9ff511_4 2025-05-07T19:45:21.6847907Z libuv conda-forge/linux-64::libuv-1.50.0-hb9d3cd8_0 2025-05-07T19:45:21.6848385Z libwebp-base conda-forge/linux-64::libwebp-base-1.5.0-h851e524_0 2025-05-07T19:45:21.6848881Z libxcb conda-forge/linux-64::libxcb-1.17.0-h8a09558_0 2025-05-07T19:45:21.6849315Z make conda-forge/linux-64::make-4.4.1-hb9d3cd8_2 2025-05-07T19:45:21.6849808Z markupsafe conda-forge/linux-64::markupsafe-3.0.2-py311h2dc5d0c_1 2025-05-07T19:45:21.6850351Z ninja conda-forge/linux-64::ninja-1.12.1-hff21bea_1 2025-05-07T19:45:21.6850868Z openblas conda-forge/linux-64::openblas-0.3.29-pthreads_h6ec200e_0 2025-05-07T19:45:21.6851424Z openjdk conda-forge/linux-64::openjdk-23.0.1-h4c11d01_0 2025-05-07T19:45:21.6851953Z packaging conda-forge/noarch::packaging-25.0-pyh29332c3_1 2025-05-07T19:45:21.6852455Z patchelf conda-forge/linux-64::patchelf-0.18.0-h3f2d84a_2 2025-05-07T19:45:21.6852938Z pcre2 conda-forge/linux-64::pcre2-10.44-hc749103_2 2025-05-07T19:45:21.6853471Z pixman conda-forge/linux-64::pixman-0.46.0-h29eaf8c_0 2025-05-07T19:45:21.6854028Z pthread-stubs conda-forge/linux-64::pthread-stubs-0.4-hb9d3cd8_1002 2025-05-07T19:45:21.6854591Z pyelftools conda-forge/noarch::pyelftools-0.32-pyh707e725_1 2025-05-07T19:45:21.6855140Z pyyaml conda-forge/linux-64::pyyaml-6.0.2-py311h2dc5d0c_2 2025-05-07T19:45:21.6855899Z re2 conda-forge/linux-64::re2-2024.07.02-h9925aae_3 2025-05-07T19:45:21.6856336Z rhash conda-forge/linux-64::rhash-1.4.5-hb9d3cd8_0 2025-05-07T19:45:21.6856871Z scikit-build conda-forge/noarch::scikit-build-0.18.1-pyhae55e72_2 2025-05-07T19:45:21.6857405Z singlejar conda-forge/linux-64::singlejar-7.5.0-h0e684df_1 2025-05-07T19:45:21.6857991Z sortedcontainers conda-forge/noarch::sortedcontainers-2.4.0-pyhd8ed1ab_1 2025-05-07T19:45:21.6858549Z tomli conda-forge/noarch::tomli-2.2.1-pyhd8ed1ab_1 2025-05-07T19:45:21.6859268Z xorg-libice conda-forge/linux-64::xorg-libice-1.1.2-hb9d3cd8_0 2025-05-07T19:45:21.6859814Z xorg-libsm conda-forge/linux-64::xorg-libsm-1.2.6-he73a12e_0 2025-05-07T19:45:21.6860330Z xorg-libx11 conda-forge/linux-64::xorg-libx11-1.8.12-h4f16b4b_0 2025-05-07T19:45:21.6860889Z xorg-libxau conda-forge/linux-64::xorg-libxau-1.0.12-hb9d3cd8_0 2025-05-07T19:45:21.6861476Z xorg-libxdmcp conda-forge/linux-64::xorg-libxdmcp-1.1.5-hb9d3cd8_0 2025-05-07T19:45:21.6862028Z xorg-libxext conda-forge/linux-64::xorg-libxext-1.3.6-hb9d3cd8_0 2025-05-07T19:45:21.6862617Z xorg-libxfixes conda-forge/linux-64::xorg-libxfixes-6.0.1-hb9d3cd8_0 2025-05-07T19:45:21.6863160Z xorg-libxi conda-forge/linux-64::xorg-libxi-1.8.2-hb9d3cd8_0 2025-05-07T19:45:21.6863733Z xorg-libxrandr conda-forge/linux-64::xorg-libxrandr-1.5.4-hb9d3cd8_0 2025-05-07T19:45:21.6864351Z xorg-libxrender conda-forge/linux-64::xorg-libxrender-0.9.12-hb9d3cd8_0 2025-05-07T19:45:21.6864913Z xorg-libxt conda-forge/linux-64::xorg-libxt-1.3.1-hb9d3cd8_0 2025-05-07T19:45:21.6865467Z xorg-libxtst conda-forge/linux-64::xorg-libxtst-1.2.5-hb9d3cd8_3 2025-05-07T19:45:21.6866005Z xz-gpl-tools conda-forge/linux-64::xz-gpl-tools-5.8.1-hbcc6ac9_1 2025-05-07T19:45:21.6866535Z xz-tools conda-forge/linux-64::xz-tools-5.8.1-hb9d3cd8_1 2025-05-07T19:45:21.6867013Z yaml conda-forge/linux-64::yaml-0.2.5-h7f98852_2 2025-05-07T19:45:21.6867422Z 2025-05-07T19:45:21.6867552Z The following packages will be UPDATED: 2025-05-07T19:45:21.6867811Z 2025-05-07T19:45:21.6868112Z libuuid pkgs/main::libuuid-1.41.5-h5eee18b_0 --> conda-forge::libuuid-2.38.1-h0b41bf4_0 2025-05-07T19:45:21.6868685Z libzlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:21.6869273Z ncurses pkgs/main::ncurses-6.4-h6a678d5_0 --> conda-forge::ncurses-6.5-h2d0b736_3 2025-05-07T19:45:21.6870007Z python pkgs/main::python-3.11.11-he870216_0 --> conda-forge::python-3.11.11-h9e4cc4f_2_cpython 2025-05-07T19:45:21.6870714Z sqlite pkgs/main::sqlite-3.45.3-h5eee18b_0 --> conda-forge::sqlite-3.49.2-h9eae976_0 2025-05-07T19:45:21.6871447Z wheel pkgs/main/linux-64::wheel-0.45.1-py31~ --> conda-forge/noarch::wheel-0.45.1-pyhd8ed1ab_1 2025-05-07T19:45:21.6872122Z xz pkgs/main::xz-5.6.4-h5eee18b_1 --> conda-forge::xz-5.8.1-hbcc6ac9_1 2025-05-07T19:45:21.6872604Z zlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:21.6873040Z zstd 1.5.6-ha6fb4c9_0 --> 1.5.7-hb8e6e7a_2 2025-05-07T19:45:21.6873301Z 2025-05-07T19:45:21.6873541Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:45:21.6873917Z 2025-05-07T19:45:21.6874167Z tk pkgs/main::tk-8.6.14-h39e8969_0 --> conda-forge::tk-8.6.13-noxft_h4845f30_101 2025-05-07T19:45:21.6874527Z 2025-05-07T19:45:21.6874548Z 2025-05-07T19:45:21.6874677Z 2025-05-07T19:45:21.6874868Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:21.6875277Z openjdk-23.0.1 | 181.3 MB | | 0% 2025-05-07T19:45:21.6875557Z 2025-05-07T19:45:21.6875980Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:21.6876233Z 2025-05-07T19:45:21.6876237Z 2025-05-07T19:45:21.6876576Z python-3.11.11 | 29.2 MB | | 0%  2025-05-07T19:45:21.6876842Z 2025-05-07T19:45:21.6876846Z 2025-05-07T19:45:21.6876944Z 2025-05-07T19:45:21.6884741Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:21.6885043Z 2025-05-07T19:45:21.6885053Z 2025-05-07T19:45:21.6885190Z 2025-05-07T19:45:21.6885246Z 2025-05-07T19:45:21.6903431Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:21.6903931Z 2025-05-07T19:45:21.6904252Z 2025-05-07T19:45:21.6904258Z 2025-05-07T19:45:21.6904262Z 2025-05-07T19:45:21.6904268Z 2025-05-07T19:45:21.6904595Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:21.6904876Z 2025-05-07T19:45:21.6904879Z 2025-05-07T19:45:21.6904883Z 2025-05-07T19:45:21.6904886Z 2025-05-07T19:45:21.6904889Z 2025-05-07T19:45:21.6904893Z 2025-05-07T19:45:21.6905164Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:21.6905474Z 2025-05-07T19:45:21.6905478Z 2025-05-07T19:45:21.6905481Z 2025-05-07T19:45:21.6905485Z 2025-05-07T19:45:21.6905488Z 2025-05-07T19:45:21.6905491Z 2025-05-07T19:45:21.6905495Z 2025-05-07T19:45:21.6905735Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:21.6906022Z 2025-05-07T19:45:21.6906026Z 2025-05-07T19:45:21.6906030Z 2025-05-07T19:45:21.6906033Z 2025-05-07T19:45:21.6906036Z 2025-05-07T19:45:21.6906039Z 2025-05-07T19:45:21.6906043Z 2025-05-07T19:45:21.6906046Z 2025-05-07T19:45:21.6916072Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:21.6917017Z 2025-05-07T19:45:21.6917089Z 2025-05-07T19:45:21.6917100Z 2025-05-07T19:45:21.6917111Z 2025-05-07T19:45:21.6917122Z 2025-05-07T19:45:21.6917132Z 2025-05-07T19:45:21.6917142Z 2025-05-07T19:45:21.6917153Z 2025-05-07T19:45:21.6917163Z 2025-05-07T19:45:21.6917959Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:21.6918862Z 2025-05-07T19:45:21.6918873Z 2025-05-07T19:45:21.6918883Z 2025-05-07T19:45:21.6918894Z 2025-05-07T19:45:21.6918904Z 2025-05-07T19:45:21.6918914Z 2025-05-07T19:45:21.6918924Z 2025-05-07T19:45:21.6918934Z 2025-05-07T19:45:21.6918944Z 2025-05-07T19:45:21.6918954Z 2025-05-07T19:45:21.6919607Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:21.6920390Z 2025-05-07T19:45:21.6920401Z 2025-05-07T19:45:21.6920412Z 2025-05-07T19:45:21.6920422Z 2025-05-07T19:45:21.6920445Z 2025-05-07T19:45:21.6920456Z 2025-05-07T19:45:21.6920466Z 2025-05-07T19:45:21.6920476Z 2025-05-07T19:45:21.6920501Z 2025-05-07T19:45:21.6920512Z 2025-05-07T19:45:21.6920523Z 2025-05-07T19:45:21.6921360Z font-ttf-ubuntu-0.83 | 1.5 MB | | 0%  2025-05-07T19:45:21.6921692Z 2025-05-07T19:45:21.6921696Z 2025-05-07T19:45:21.6921699Z 2025-05-07T19:45:21.6921703Z 2025-05-07T19:45:21.6921707Z 2025-05-07T19:45:21.6921710Z 2025-05-07T19:45:21.6921735Z 2025-05-07T19:45:21.6921738Z 2025-05-07T19:45:21.6921741Z 2025-05-07T19:45:21.6921745Z 2025-05-07T19:45:21.6921764Z 2025-05-07T19:45:21.6921768Z 2025-05-07T19:45:21.6922061Z harfbuzz-9.0.0 | 1.5 MB | | 0%  2025-05-07T19:45:21.6922361Z 2025-05-07T19:45:21.6922365Z 2025-05-07T19:45:21.6922368Z 2025-05-07T19:45:21.6922372Z 2025-05-07T19:45:21.6922375Z 2025-05-07T19:45:21.6922379Z 2025-05-07T19:45:21.6922382Z 2025-05-07T19:45:21.6922391Z 2025-05-07T19:45:21.6922412Z 2025-05-07T19:45:21.6922415Z 2025-05-07T19:45:21.6922418Z 2025-05-07T19:45:21.6922763Z 2025-05-07T19:45:21.6922772Z 2025-05-07T19:45:21.6923101Z libgfortran5-15.1.0 | 1.5 MB | | 0%  2025-05-07T19:45:21.6923428Z 2025-05-07T19:45:21.6923431Z 2025-05-07T19:45:21.6923435Z 2025-05-07T19:45:21.6923438Z 2025-05-07T19:45:21.6923462Z 2025-05-07T19:45:21.6923466Z 2025-05-07T19:45:21.6923469Z 2025-05-07T19:45:21.6923473Z 2025-05-07T19:45:21.6923476Z 2025-05-07T19:45:21.6923480Z 2025-05-07T19:45:21.6923484Z 2025-05-07T19:45:21.6923488Z 2025-05-07T19:45:21.6923491Z 2025-05-07T19:45:21.6923495Z 2025-05-07T19:45:21.6923759Z krb5-1.21.3 | 1.3 MB | | 0%  2025-05-07T19:45:21.6924061Z 2025-05-07T19:45:21.6924065Z 2025-05-07T19:45:21.6924068Z 2025-05-07T19:45:21.6924073Z 2025-05-07T19:45:21.6924076Z 2025-05-07T19:45:21.6924080Z 2025-05-07T19:45:21.6924145Z 2025-05-07T19:45:21.6924148Z 2025-05-07T19:45:21.6924152Z 2025-05-07T19:45:21.6924155Z 2025-05-07T19:45:21.6924162Z 2025-05-07T19:45:21.6924165Z 2025-05-07T19:45:21.6924169Z 2025-05-07T19:45:21.6924172Z 2025-05-07T19:45:21.6924175Z 2025-05-07T19:45:21.6924489Z libabseil-20250127.1 | 1.3 MB | | 0%  2025-05-07T19:45:21.6924844Z 2025-05-07T19:45:21.6924847Z 2025-05-07T19:45:21.6924850Z 2025-05-07T19:45:21.6924854Z 2025-05-07T19:45:21.6924857Z 2025-05-07T19:45:21.6924861Z 2025-05-07T19:45:21.6924865Z 2025-05-07T19:45:21.6924868Z 2025-05-07T19:45:21.6924871Z 2025-05-07T19:45:21.6924875Z 2025-05-07T19:45:21.6924878Z 2025-05-07T19:45:21.6924881Z 2025-05-07T19:45:21.6924885Z 2025-05-07T19:45:21.6924888Z 2025-05-07T19:45:21.6924891Z 2025-05-07T19:45:21.6924895Z 2025-05-07T19:45:21.6925227Z cairo-1.18.0 | 961 KB | | 0%  2025-05-07T19:45:21.6925531Z 2025-05-07T19:45:21.6925535Z 2025-05-07T19:45:21.6925538Z 2025-05-07T19:45:21.6925542Z 2025-05-07T19:45:21.6925550Z 2025-05-07T19:45:21.6925553Z 2025-05-07T19:45:21.6925557Z 2025-05-07T19:45:21.6925560Z 2025-05-07T19:45:21.6925583Z 2025-05-07T19:45:21.6925586Z 2025-05-07T19:45:21.6925590Z 2025-05-07T19:45:21.6925593Z 2025-05-07T19:45:21.6925596Z 2025-05-07T19:45:21.6925600Z 2025-05-07T19:45:21.6925603Z 2025-05-07T19:45:21.6925606Z 2025-05-07T19:45:21.6925610Z 2025-05-07T19:45:21.6925894Z pcre2-10.44 | 934 KB | | 0%  2025-05-07T19:45:21.6926193Z 2025-05-07T19:45:21.6926218Z 2025-05-07T19:45:21.6926222Z 2025-05-07T19:45:21.6926225Z 2025-05-07T19:45:21.6926228Z 2025-05-07T19:45:21.6926232Z 2025-05-07T19:45:21.6926235Z 2025-05-07T19:45:21.6926239Z 2025-05-07T19:45:21.6926242Z 2025-05-07T19:45:21.6926246Z 2025-05-07T19:45:21.6926249Z 2025-05-07T19:45:21.6926253Z 2025-05-07T19:45:21.6926257Z 2025-05-07T19:45:21.6926264Z 2025-05-07T19:45:21.6926267Z 2025-05-07T19:45:21.6926271Z 2025-05-07T19:45:21.6926274Z 2025-05-07T19:45:21.6926281Z 2025-05-07T19:45:21.6926614Z libsqlite-3.49.2 | 895 KB | | 0%  2025-05-07T19:45:21.6926938Z 2025-05-07T19:45:21.6926941Z 2025-05-07T19:45:21.6926944Z 2025-05-07T19:45:21.6926948Z 2025-05-07T19:45:21.6926952Z 2025-05-07T19:45:21.6926956Z 2025-05-07T19:45:21.6926959Z 2025-05-07T19:45:21.6926962Z 2025-05-07T19:45:21.6926966Z 2025-05-07T19:45:21.6926969Z 2025-05-07T19:45:21.6926973Z 2025-05-07T19:45:21.6926976Z 2025-05-07T19:45:21.6926980Z 2025-05-07T19:45:21.6926983Z 2025-05-07T19:45:21.6926987Z 2025-05-07T19:45:21.6926990Z 2025-05-07T19:45:21.6927013Z 2025-05-07T19:45:21.6927017Z 2025-05-07T19:45:21.6927020Z 2025-05-07T19:45:21.8353774Z ... (more hidden) ... 2025-05-07T19:45:21.8354691Z 2025-05-07T19:45:21.8354705Z 2025-05-07T19:45:21.8354747Z 2025-05-07T19:45:21.8421712Z 2025-05-07T19:45:21.8471958Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:21.8472823Z 2025-05-07T19:45:21.8472837Z 2025-05-07T19:45:21.8472868Z 2025-05-07T19:45:21.9356646Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:21.9357377Z 2025-05-07T19:45:21.9357389Z 2025-05-07T19:45:21.9357395Z 2025-05-07T19:45:21.9357400Z 2025-05-07T19:45:21.9435697Z libgrpc-1.71.0 | 7.6 MB | 2 | 3%  2025-05-07T19:45:21.9436144Z 2025-05-07T19:45:21.9436148Z 2025-05-07T19:45:21.9436152Z 2025-05-07T19:45:21.9641577Z cmake-4.0.2 | 19.4 MB | 1 | 1%  2025-05-07T19:45:21.9641875Z 2025-05-07T19:45:21.9767654Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:21.9767959Z 2025-05-07T19:45:21.9767963Z 2025-05-07T19:45:22.0125741Z python-3.11.11 | 29.2 MB | | 0%  2025-05-07T19:45:22.0362477Z openjdk-23.0.1 | 181.3 MB | | 0% 2025-05-07T19:45:22.0508540Z 2025-05-07T19:45:22.0508547Z 2025-05-07T19:45:22.0508570Z 2025-05-07T19:45:22.0508575Z 2025-05-07T19:45:22.0509054Z libgrpc-1.71.0 | 7.6 MB | #######6 | 77%  2025-05-07T19:45:22.0509353Z 2025-05-07T19:45:22.0509357Z 2025-05-07T19:45:22.0509360Z 2025-05-07T19:45:22.0644933Z cmake-4.0.2 | 19.4 MB | #6 | 17%  2025-05-07T19:45:22.0645745Z 2025-05-07T19:45:22.0768092Z bazel-7.5.0 | 47.4 MB | #5 | 16%  2025-05-07T19:45:22.0768363Z 2025-05-07T19:45:22.0768368Z 2025-05-07T19:45:22.1142260Z python-3.11.11 | 29.2 MB | #7 | 17%  2025-05-07T19:45:22.1494419Z openjdk-23.0.1 | 181.3 MB | 2 | 3% 2025-05-07T19:45:22.1494703Z 2025-05-07T19:45:22.1494717Z 2025-05-07T19:45:22.1494720Z 2025-05-07T19:45:22.1496235Z 2025-05-07T19:45:22.1504910Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:22.1505225Z 2025-05-07T19:45:22.1505238Z 2025-05-07T19:45:22.1505245Z 2025-05-07T19:45:22.1644861Z cmake-4.0.2 | 19.4 MB | ####1 | 41%  2025-05-07T19:45:22.1645153Z 2025-05-07T19:45:22.1768911Z bazel-7.5.0 | 47.4 MB | ###4 | 35%  2025-05-07T19:45:22.1769190Z 2025-05-07T19:45:22.1769194Z 2025-05-07T19:45:22.2036096Z python-3.11.11 | 29.2 MB | ###7 | 38%  2025-05-07T19:45:22.2036410Z 2025-05-07T19:45:22.2036415Z 2025-05-07T19:45:22.2036420Z 2025-05-07T19:45:22.2036425Z 2025-05-07T19:45:22.2036428Z 2025-05-07T19:45:22.2145758Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:22.2510491Z openjdk-23.0.1 | 181.3 MB | 5 | 5% 2025-05-07T19:45:22.2510974Z 2025-05-07T19:45:22.2511050Z 2025-05-07T19:45:22.2511056Z 2025-05-07T19:45:22.2774004Z cmake-4.0.2 | 19.4 MB | ###### | 61%  2025-05-07T19:45:22.2774838Z 2025-05-07T19:45:22.2774853Z 2025-05-07T19:45:22.3103489Z python-3.11.11 | 29.2 MB | #####4 | 55%  2025-05-07T19:45:22.3103782Z 2025-05-07T19:45:22.3142888Z bazel-7.5.0 | 47.4 MB | ####8 | 49%  2025-05-07T19:45:22.3168662Z openjdk-23.0.1 | 181.3 MB | 7 | 8% 2025-05-07T19:45:22.3169458Z 2025-05-07T19:45:22.3169495Z 2025-05-07T19:45:22.3169507Z 2025-05-07T19:45:22.3169518Z 2025-05-07T19:45:22.3169529Z 2025-05-07T19:45:22.3515144Z openblas-0.3.29 | 5.8 MB | ###### | 61%  2025-05-07T19:45:22.3516027Z 2025-05-07T19:45:22.3516040Z 2025-05-07T19:45:22.3516051Z 2025-05-07T19:45:22.3775250Z cmake-4.0.2 | 19.4 MB | ########3 | 84%  2025-05-07T19:45:22.3776022Z 2025-05-07T19:45:22.3776026Z 2025-05-07T19:45:22.4143347Z python-3.11.11 | 29.2 MB | #######1 | 71%  2025-05-07T19:45:22.4288203Z openjdk-23.0.1 | 181.3 MB | # | 10% 2025-05-07T19:45:22.4288478Z 2025-05-07T19:45:22.4684953Z bazel-7.5.0 | 47.4 MB | ###### | 61%  2025-05-07T19:45:22.4685253Z 2025-05-07T19:45:22.4685258Z 2025-05-07T19:45:22.4685262Z 2025-05-07T19:45:22.4685504Z 2025-05-07T19:45:22.4685509Z 2025-05-07T19:45:22.4685783Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:22.4686062Z 2025-05-07T19:45:22.4686066Z 2025-05-07T19:45:22.4686070Z 2025-05-07T19:45:22.4686073Z 2025-05-07T19:45:22.4686077Z 2025-05-07T19:45:22.4775296Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:22.4775624Z 2025-05-07T19:45:22.4775629Z 2025-05-07T19:45:22.5144715Z python-3.11.11 | 29.2 MB | #########1 | 92%  2025-05-07T19:45:22.5190871Z openjdk-23.0.1 | 181.3 MB | #3 | 13% 2025-05-07T19:45:22.5191257Z 2025-05-07T19:45:22.5191437Z 2025-05-07T19:45:22.5191445Z 2025-05-07T19:45:22.5191481Z 2025-05-07T19:45:22.5191486Z 2025-05-07T19:45:22.5191501Z 2025-05-07T19:45:22.5288803Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:22.5289466Z 2025-05-07T19:45:22.6145911Z bazel-7.5.0 | 47.4 MB | #######4 | 74%  2025-05-07T19:45:22.6290812Z openjdk-23.0.1 | 181.3 MB | #6 | 16% 2025-05-07T19:45:22.6291084Z 2025-05-07T19:45:22.6688562Z bazel-7.5.0 | 47.4 MB | ########7 | 87%  2025-05-07T19:45:22.6688844Z 2025-05-07T19:45:22.6688849Z 2025-05-07T19:45:22.6688852Z 2025-05-07T19:45:22.6688857Z 2025-05-07T19:45:22.6688865Z 2025-05-07T19:45:22.6688873Z 2025-05-07T19:45:22.6689148Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:22.6689459Z 2025-05-07T19:45:22.6689463Z 2025-05-07T19:45:22.6689469Z 2025-05-07T19:45:22.6689473Z 2025-05-07T19:45:22.6689476Z 2025-05-07T19:45:22.6689479Z 2025-05-07T19:45:22.7132254Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:22.7132604Z 2025-05-07T19:45:22.7132608Z 2025-05-07T19:45:22.7132612Z 2025-05-07T19:45:22.7132616Z 2025-05-07T19:45:22.7132636Z 2025-05-07T19:45:22.7132639Z 2025-05-07T19:45:22.7132643Z 2025-05-07T19:45:22.7144110Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:22.7368329Z openjdk-23.0.1 | 181.3 MB | #9 | 19% 2025-05-07T19:45:22.7369121Z 2025-05-07T19:45:22.7369135Z 2025-05-07T19:45:22.7369146Z 2025-05-07T19:45:22.8106035Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:22.8106323Z 2025-05-07T19:45:22.8106437Z 2025-05-07T19:45:22.8106441Z 2025-05-07T19:45:22.8106445Z 2025-05-07T19:45:22.8106566Z 2025-05-07T19:45:22.8106574Z 2025-05-07T19:45:22.8106580Z 2025-05-07T19:45:22.8106585Z 2025-05-07T19:45:22.8146291Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:22.8305984Z openjdk-23.0.1 | 181.3 MB | ##3 | 24% 2025-05-07T19:45:22.8306506Z 2025-05-07T19:45:22.8306569Z 2025-05-07T19:45:22.8306574Z 2025-05-07T19:45:22.8306602Z 2025-05-07T19:45:22.8306607Z 2025-05-07T19:45:22.8306631Z 2025-05-07T19:45:22.8306696Z 2025-05-07T19:45:22.8307041Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:22.8307330Z 2025-05-07T19:45:22.8307335Z 2025-05-07T19:45:22.8307338Z 2025-05-07T19:45:22.8307342Z 2025-05-07T19:45:22.8307346Z 2025-05-07T19:45:22.8307349Z 2025-05-07T19:45:22.8307353Z 2025-05-07T19:45:22.8907330Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:22.8907639Z 2025-05-07T19:45:22.8907684Z 2025-05-07T19:45:22.8907688Z 2025-05-07T19:45:22.8907867Z 2025-05-07T19:45:22.8907876Z 2025-05-07T19:45:22.8907882Z 2025-05-07T19:45:22.8907887Z 2025-05-07T19:45:22.8907892Z 2025-05-07T19:45:22.8907896Z 2025-05-07T19:45:22.9094337Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:22.9094685Z 2025-05-07T19:45:22.9094690Z 2025-05-07T19:45:22.9148947Z python-3.11.11 | 29.2 MB | ########## | 100%  2025-05-07T19:45:22.9325481Z openjdk-23.0.1 | 181.3 MB | ##7 | 28% 2025-05-07T19:45:22.9325769Z 2025-05-07T19:45:22.9325773Z 2025-05-07T19:45:22.9326009Z 2025-05-07T19:45:22.9326015Z 2025-05-07T19:45:22.9326018Z 2025-05-07T19:45:22.9326022Z 2025-05-07T19:45:22.9326025Z 2025-05-07T19:45:22.9326044Z 2025-05-07T19:45:22.9326362Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:22.9326649Z 2025-05-07T19:45:22.9326652Z 2025-05-07T19:45:22.9326656Z 2025-05-07T19:45:22.9326659Z 2025-05-07T19:45:22.9326663Z 2025-05-07T19:45:22.9326666Z 2025-05-07T19:45:22.9326670Z 2025-05-07T19:45:22.9326673Z 2025-05-07T19:45:22.9519028Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:22.9519385Z 2025-05-07T19:45:22.9519390Z 2025-05-07T19:45:22.9519395Z 2025-05-07T19:45:22.9519398Z 2025-05-07T19:45:22.9519402Z 2025-05-07T19:45:22.9519405Z 2025-05-07T19:45:22.9519409Z 2025-05-07T19:45:22.9519413Z 2025-05-07T19:45:22.9519416Z 2025-05-07T19:45:22.9519616Z 2025-05-07T19:45:22.9871936Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:22.9872235Z 2025-05-07T19:45:22.9872428Z 2025-05-07T19:45:22.9872438Z 2025-05-07T19:45:22.9872443Z 2025-05-07T19:45:22.9872447Z 2025-05-07T19:45:22.9872452Z 2025-05-07T19:45:22.9872456Z 2025-05-07T19:45:22.9872461Z 2025-05-07T19:45:22.9965619Z 2025-05-07T19:45:22.9966730Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:22.9968059Z 2025-05-07T19:45:22.9968073Z 2025-05-07T19:45:22.9968084Z 2025-05-07T19:45:22.9968094Z 2025-05-07T19:45:22.9968105Z 2025-05-07T19:45:22.9968115Z 2025-05-07T19:45:22.9968126Z 2025-05-07T19:45:22.9968136Z 2025-05-07T19:45:22.9968147Z 2025-05-07T19:45:22.9968157Z 2025-05-07T19:45:22.9968167Z 2025-05-07T19:45:22.9988554Z font-ttf-ubuntu-0.83 | 1.5 MB | 1 | 1%  2025-05-07T19:45:22.9988931Z 2025-05-07T19:45:22.9988986Z 2025-05-07T19:45:22.9988990Z 2025-05-07T19:45:22.9988994Z 2025-05-07T19:45:23.0295036Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:23.0295395Z 2025-05-07T19:45:23.0295401Z 2025-05-07T19:45:23.0295406Z 2025-05-07T19:45:23.0295411Z 2025-05-07T19:45:23.0295415Z 2025-05-07T19:45:23.0295419Z 2025-05-07T19:45:23.0295424Z 2025-05-07T19:45:23.0295428Z 2025-05-07T19:45:23.0295433Z 2025-05-07T19:45:23.0295437Z 2025-05-07T19:45:23.0295442Z 2025-05-07T19:45:23.0320651Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:23.0321683Z 2025-05-07T19:45:23.0321696Z 2025-05-07T19:45:23.0321707Z 2025-05-07T19:45:23.0321717Z 2025-05-07T19:45:23.0321727Z 2025-05-07T19:45:23.0321737Z 2025-05-07T19:45:23.0321747Z 2025-05-07T19:45:23.0321758Z 2025-05-07T19:45:23.0321768Z 2025-05-07T19:45:23.0321778Z 2025-05-07T19:45:23.0473740Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:23.0474099Z 2025-05-07T19:45:23.0474225Z 2025-05-07T19:45:23.0474233Z 2025-05-07T19:45:23.0474238Z 2025-05-07T19:45:23.0474260Z 2025-05-07T19:45:23.0474354Z 2025-05-07T19:45:23.0474361Z 2025-05-07T19:45:23.0474366Z 2025-05-07T19:45:23.0474379Z 2025-05-07T19:45:23.0474383Z 2025-05-07T19:45:23.0474416Z 2025-05-07T19:45:23.0474561Z 2025-05-07T19:45:23.0546664Z harfbuzz-9.0.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:23.0564708Z openjdk-23.0.1 | 181.3 MB | ###1 | 31% 2025-05-07T19:45:23.0565006Z 2025-05-07T19:45:23.0565011Z 2025-05-07T19:45:23.0565014Z 2025-05-07T19:45:23.0565018Z 2025-05-07T19:45:23.0565022Z 2025-05-07T19:45:23.0565025Z 2025-05-07T19:45:23.0796719Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:23.0797254Z 2025-05-07T19:45:23.0797308Z 2025-05-07T19:45:23.0797313Z 2025-05-07T19:45:23.0797386Z 2025-05-07T19:45:23.0797392Z 2025-05-07T19:45:23.0797396Z 2025-05-07T19:45:23.0797412Z 2025-05-07T19:45:23.0797416Z 2025-05-07T19:45:23.0797419Z 2025-05-07T19:45:23.0797423Z 2025-05-07T19:45:23.0797711Z 2025-05-07T19:45:23.0797716Z 2025-05-07T19:45:23.0797720Z 2025-05-07T19:45:23.0797723Z 2025-05-07T19:45:23.0799013Z krb5-1.21.3 | 1.3 MB | 1 | 1%  2025-05-07T19:45:23.0799315Z 2025-05-07T19:45:23.0799319Z 2025-05-07T19:45:23.0799335Z 2025-05-07T19:45:23.0799338Z 2025-05-07T19:45:23.0799342Z 2025-05-07T19:45:23.0799345Z 2025-05-07T19:45:23.0799349Z 2025-05-07T19:45:23.0799353Z 2025-05-07T19:45:23.0799356Z 2025-05-07T19:45:23.0799359Z 2025-05-07T19:45:23.0799363Z 2025-05-07T19:45:23.0799366Z 2025-05-07T19:45:23.0799370Z 2025-05-07T19:45:23.1058845Z libgfortran5-15.1.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:23.1059285Z 2025-05-07T19:45:23.1059504Z 2025-05-07T19:45:23.1059564Z 2025-05-07T19:45:23.1059569Z 2025-05-07T19:45:23.1059767Z 2025-05-07T19:45:23.1060034Z 2025-05-07T19:45:23.1060042Z 2025-05-07T19:45:23.1060048Z 2025-05-07T19:45:23.1060055Z 2025-05-07T19:45:23.1060085Z 2025-05-07T19:45:23.1060088Z 2025-05-07T19:45:23.1060092Z 2025-05-07T19:45:23.1266958Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:23.1267544Z 2025-05-07T19:45:23.1267550Z 2025-05-07T19:45:23.1267553Z 2025-05-07T19:45:23.1267557Z 2025-05-07T19:45:23.1267560Z 2025-05-07T19:45:23.1267564Z 2025-05-07T19:45:23.1267568Z 2025-05-07T19:45:23.1267572Z 2025-05-07T19:45:23.1267576Z 2025-05-07T19:45:23.1267580Z 2025-05-07T19:45:23.1267584Z 2025-05-07T19:45:23.1267612Z 2025-05-07T19:45:23.1267615Z 2025-05-07T19:45:23.1267620Z 2025-05-07T19:45:23.1347769Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:23.1348123Z 2025-05-07T19:45:23.1348130Z 2025-05-07T19:45:23.1348135Z 2025-05-07T19:45:23.1348140Z 2025-05-07T19:45:23.1348146Z 2025-05-07T19:45:23.1348187Z 2025-05-07T19:45:23.1348192Z 2025-05-07T19:45:23.1348196Z 2025-05-07T19:45:23.1348201Z 2025-05-07T19:45:23.1348229Z 2025-05-07T19:45:23.1348233Z 2025-05-07T19:45:23.1348236Z 2025-05-07T19:45:23.1348240Z 2025-05-07T19:45:23.1546808Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:23.1561428Z openjdk-23.0.1 | 181.3 MB | ###5 | 35% 2025-05-07T19:45:23.1561772Z 2025-05-07T19:45:23.1561841Z 2025-05-07T19:45:23.1561848Z 2025-05-07T19:45:23.1561851Z 2025-05-07T19:45:23.1561922Z 2025-05-07T19:45:23.1633680Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:23.1634012Z 2025-05-07T19:45:23.1634017Z 2025-05-07T19:45:23.1634020Z 2025-05-07T19:45:23.1634024Z 2025-05-07T19:45:23.1634027Z 2025-05-07T19:45:23.1634031Z 2025-05-07T19:45:23.1634035Z 2025-05-07T19:45:23.1634039Z 2025-05-07T19:45:23.1634042Z 2025-05-07T19:45:23.1634045Z 2025-05-07T19:45:23.1634049Z 2025-05-07T19:45:23.1634068Z 2025-05-07T19:45:23.1634072Z 2025-05-07T19:45:23.1634075Z 2025-05-07T19:45:23.1635033Z 2025-05-07T19:45:23.1637472Z libabseil-20250127.1 | 1.3 MB | 1 | 1%  2025-05-07T19:45:23.1637815Z 2025-05-07T19:45:23.1637830Z 2025-05-07T19:45:23.1637833Z 2025-05-07T19:45:23.1637837Z 2025-05-07T19:45:23.1637840Z 2025-05-07T19:45:23.1637844Z 2025-05-07T19:45:23.1637860Z 2025-05-07T19:45:23.1637864Z 2025-05-07T19:45:23.1637867Z 2025-05-07T19:45:23.1637870Z 2025-05-07T19:45:23.1637874Z 2025-05-07T19:45:23.1637877Z 2025-05-07T19:45:23.1637880Z 2025-05-07T19:45:23.1637884Z 2025-05-07T19:45:23.1637887Z 2025-05-07T19:45:23.1638191Z 2025-05-07T19:45:23.1971125Z cairo-1.18.0 | 961 KB | 1 | 2%  2025-05-07T19:45:23.1971480Z 2025-05-07T19:45:23.1971484Z 2025-05-07T19:45:23.1971488Z 2025-05-07T19:45:23.1971492Z 2025-05-07T19:45:23.1971495Z 2025-05-07T19:45:23.1971512Z 2025-05-07T19:45:23.1971516Z 2025-05-07T19:45:23.1971519Z 2025-05-07T19:45:23.1971523Z 2025-05-07T19:45:23.1971736Z 2025-05-07T19:45:23.1971740Z 2025-05-07T19:45:23.1971744Z 2025-05-07T19:45:23.1971747Z 2025-05-07T19:45:23.1971751Z 2025-05-07T19:45:23.1971754Z 2025-05-07T19:45:23.1971772Z 2025-05-07T19:45:23.1977072Z 2025-05-07T19:45:23.1984007Z pcre2-10.44 | 934 KB | 1 | 2%  2025-05-07T19:45:23.1984923Z 2025-05-07T19:45:23.1984935Z 2025-05-07T19:45:23.1984946Z 2025-05-07T19:45:23.1984956Z 2025-05-07T19:45:23.1984983Z 2025-05-07T19:45:23.1984994Z 2025-05-07T19:45:23.1985004Z 2025-05-07T19:45:23.1985014Z 2025-05-07T19:45:23.1985024Z 2025-05-07T19:45:23.1985034Z 2025-05-07T19:45:23.1985044Z 2025-05-07T19:45:23.1985054Z 2025-05-07T19:45:23.1985064Z 2025-05-07T19:45:23.1985074Z 2025-05-07T19:45:23.1985084Z 2025-05-07T19:45:23.1985095Z 2025-05-07T19:45:23.2146421Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:23.2146987Z 2025-05-07T19:45:23.2146992Z 2025-05-07T19:45:23.2147003Z 2025-05-07T19:45:23.2147007Z 2025-05-07T19:45:23.2147010Z 2025-05-07T19:45:23.2147014Z 2025-05-07T19:45:23.2147017Z 2025-05-07T19:45:23.2147021Z 2025-05-07T19:45:23.2147024Z 2025-05-07T19:45:23.2147027Z 2025-05-07T19:45:23.2147031Z 2025-05-07T19:45:23.2147034Z 2025-05-07T19:45:23.2147038Z 2025-05-07T19:45:23.2147041Z 2025-05-07T19:45:23.2147045Z 2025-05-07T19:45:23.2290373Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:23.2290746Z 2025-05-07T19:45:23.2290751Z 2025-05-07T19:45:23.2290755Z 2025-05-07T19:45:23.2290758Z 2025-05-07T19:45:23.2290761Z 2025-05-07T19:45:23.2290765Z 2025-05-07T19:45:23.2290768Z 2025-05-07T19:45:23.2290771Z 2025-05-07T19:45:23.2290775Z 2025-05-07T19:45:23.2290778Z 2025-05-07T19:45:23.2290782Z 2025-05-07T19:45:23.2290798Z 2025-05-07T19:45:23.2290813Z 2025-05-07T19:45:23.2290817Z 2025-05-07T19:45:23.2290820Z 2025-05-07T19:45:23.2290824Z 2025-05-07T19:45:23.2290968Z 2025-05-07T19:45:23.2350916Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:23.2351280Z 2025-05-07T19:45:23.2351285Z 2025-05-07T19:45:23.2351289Z 2025-05-07T19:45:23.2351292Z 2025-05-07T19:45:23.2351296Z 2025-05-07T19:45:23.2351299Z 2025-05-07T19:45:23.2351302Z 2025-05-07T19:45:23.2351306Z 2025-05-07T19:45:23.2351309Z 2025-05-07T19:45:23.2351313Z 2025-05-07T19:45:23.2351317Z 2025-05-07T19:45:23.2351320Z 2025-05-07T19:45:23.2351324Z 2025-05-07T19:45:23.2351327Z 2025-05-07T19:45:23.2351330Z 2025-05-07T19:45:23.2351334Z 2025-05-07T19:45:23.2351337Z 2025-05-07T19:45:23.2351341Z 2025-05-07T19:45:23.2612464Z libsqlite-3.49.2 | 895 KB | 1 | 2%  2025-05-07T19:45:23.2612831Z 2025-05-07T19:45:23.2612836Z 2025-05-07T19:45:23.2612853Z 2025-05-07T19:45:23.2612856Z 2025-05-07T19:45:23.2612860Z 2025-05-07T19:45:23.2612863Z 2025-05-07T19:45:23.2612867Z 2025-05-07T19:45:23.2612877Z 2025-05-07T19:45:23.2612893Z 2025-05-07T19:45:23.2612897Z 2025-05-07T19:45:23.2612900Z 2025-05-07T19:45:23.2612904Z 2025-05-07T19:45:23.2612907Z 2025-05-07T19:45:23.2612911Z 2025-05-07T19:45:23.2612914Z 2025-05-07T19:45:23.2612918Z 2025-05-07T19:45:23.2612921Z 2025-05-07T19:45:23.2612924Z 2025-05-07T19:45:23.2612928Z 2025-05-07T19:45:23.2617284Z ... (more hidden) ... 2025-05-07T19:45:23.2617596Z 2025-05-07T19:45:23.2617600Z 2025-05-07T19:45:23.2617604Z 2025-05-07T19:45:23.2617607Z 2025-05-07T19:45:23.2617611Z 2025-05-07T19:45:23.2617623Z 2025-05-07T19:45:23.2617626Z 2025-05-07T19:45:23.2617630Z 2025-05-07T19:45:23.2617633Z 2025-05-07T19:45:23.2617637Z 2025-05-07T19:45:23.2617640Z 2025-05-07T19:45:23.2617644Z 2025-05-07T19:45:23.2617647Z 2025-05-07T19:45:23.2617651Z 2025-05-07T19:45:23.2617659Z 2025-05-07T19:45:23.2617663Z 2025-05-07T19:45:23.2617666Z 2025-05-07T19:45:23.2617670Z 2025-05-07T19:45:23.2767370Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:23.2908304Z openjdk-23.0.1 | 181.3 MB | ###8 | 38% 2025-05-07T19:45:23.2908698Z 2025-05-07T19:45:23.2908893Z 2025-05-07T19:45:23.2908901Z 2025-05-07T19:45:23.2908906Z 2025-05-07T19:45:23.2908911Z 2025-05-07T19:45:23.2908915Z 2025-05-07T19:45:23.2908920Z 2025-05-07T19:45:23.2908925Z 2025-05-07T19:45:23.2908929Z 2025-05-07T19:45:23.2908934Z 2025-05-07T19:45:23.2908938Z 2025-05-07T19:45:23.2908944Z 2025-05-07T19:45:23.2908949Z 2025-05-07T19:45:23.2908953Z 2025-05-07T19:45:23.2908958Z 2025-05-07T19:45:23.2909004Z 2025-05-07T19:45:23.2909008Z 2025-05-07T19:45:23.2909013Z 2025-05-07T19:45:23.2909017Z 2025-05-07T19:45:23.3181912Z ... (more hidden) ... 2025-05-07T19:45:23.3182621Z 2025-05-07T19:45:23.3182627Z 2025-05-07T19:45:23.3182632Z 2025-05-07T19:45:23.3182637Z 2025-05-07T19:45:23.3182664Z 2025-05-07T19:45:23.3182691Z 2025-05-07T19:45:23.3182695Z 2025-05-07T19:45:23.3769557Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:23.4187835Z openjdk-23.0.1 | 181.3 MB | ####2 | 42% 2025-05-07T19:45:23.4188151Z 2025-05-07T19:45:23.4188500Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:23.4188778Z 2025-05-07T19:45:23.4773433Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:23.5891940Z openjdk-23.0.1 | 181.3 MB | ####6 | 47% 2025-05-07T19:45:23.6912950Z openjdk-23.0.1 | 181.3 MB | ##### | 50% 2025-05-07T19:45:23.7887299Z openjdk-23.0.1 | 181.3 MB | #####3 | 54% 2025-05-07T19:45:23.7887625Z 2025-05-07T19:45:23.7887630Z 2025-05-07T19:45:23.7887634Z 2025-05-07T19:45:23.7887637Z 2025-05-07T19:45:23.7887641Z 2025-05-07T19:45:23.7887681Z 2025-05-07T19:45:23.7887686Z 2025-05-07T19:45:23.7887690Z 2025-05-07T19:45:23.8029174Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:23.9208583Z openjdk-23.0.1 | 181.3 MB | #####7 | 57% 2025-05-07T19:45:24.0208968Z openjdk-23.0.1 | 181.3 MB | ###### | 61% 2025-05-07T19:45:24.1209838Z openjdk-23.0.1 | 181.3 MB | ######4 | 65% 2025-05-07T19:45:24.2210220Z openjdk-23.0.1 | 181.3 MB | ######9 | 69% 2025-05-07T19:45:24.3340458Z openjdk-23.0.1 | 181.3 MB | #######3 | 74% 2025-05-07T19:45:24.3340830Z 2025-05-07T19:45:24.3340933Z 2025-05-07T19:45:24.3340942Z 2025-05-07T19:45:24.3340974Z 2025-05-07T19:45:24.3340980Z 2025-05-07T19:45:24.3340986Z 2025-05-07T19:45:24.3341022Z 2025-05-07T19:45:24.3341026Z 2025-05-07T19:45:24.3341030Z 2025-05-07T19:45:24.3343166Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:24.3343501Z 2025-05-07T19:45:24.3343532Z 2025-05-07T19:45:24.3343536Z 2025-05-07T19:45:24.3343541Z 2025-05-07T19:45:24.3343545Z 2025-05-07T19:45:24.3343569Z 2025-05-07T19:45:24.3343573Z 2025-05-07T19:45:24.3343576Z 2025-05-07T19:45:24.3343580Z 2025-05-07T19:45:24.3361716Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:24.4029385Z openjdk-23.0.1 | 181.3 MB | #######7 | 78% 2025-05-07T19:45:24.4029674Z 2025-05-07T19:45:24.4029863Z 2025-05-07T19:45:24.4029872Z 2025-05-07T19:45:24.4029876Z 2025-05-07T19:45:24.4029881Z 2025-05-07T19:45:24.4029886Z 2025-05-07T19:45:24.4029893Z 2025-05-07T19:45:24.4029897Z 2025-05-07T19:45:24.4029932Z 2025-05-07T19:45:24.4029937Z 2025-05-07T19:45:24.4029942Z 2025-05-07T19:45:24.4031611Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:24.4031951Z 2025-05-07T19:45:24.4031967Z 2025-05-07T19:45:24.4031971Z 2025-05-07T19:45:24.4031975Z 2025-05-07T19:45:24.4031993Z 2025-05-07T19:45:24.4032015Z 2025-05-07T19:45:24.4032019Z 2025-05-07T19:45:24.4032022Z 2025-05-07T19:45:24.4032026Z 2025-05-07T19:45:24.4032268Z 2025-05-07T19:45:24.4032273Z 2025-05-07T19:45:24.4419164Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:24.5564931Z openjdk-23.0.1 | 181.3 MB | ########1 | 82% 2025-05-07T19:45:24.6565516Z openjdk-23.0.1 | 181.3 MB | ########5 | 86% 2025-05-07T19:45:24.7566087Z openjdk-23.0.1 | 181.3 MB | ########9 | 90% 2025-05-07T19:45:24.8209707Z openjdk-23.0.1 | 181.3 MB | #########3 | 94% 2025-05-07T19:45:24.8209992Z 2025-05-07T19:45:24.8209997Z 2025-05-07T19:45:24.8210001Z 2025-05-07T19:45:24.8210005Z 2025-05-07T19:45:24.8210022Z 2025-05-07T19:45:24.8210025Z 2025-05-07T19:45:24.8210029Z 2025-05-07T19:45:24.8210033Z 2025-05-07T19:45:24.8210036Z 2025-05-07T19:45:24.8210040Z 2025-05-07T19:45:24.8213285Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:24.8213764Z 2025-05-07T19:45:24.8213769Z 2025-05-07T19:45:24.8213787Z 2025-05-07T19:45:24.8213802Z 2025-05-07T19:45:24.8213805Z 2025-05-07T19:45:24.8213809Z 2025-05-07T19:45:24.8213813Z 2025-05-07T19:45:24.8213828Z 2025-05-07T19:45:24.8213832Z 2025-05-07T19:45:24.8213835Z 2025-05-07T19:45:24.8566300Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:24.9331412Z openjdk-23.0.1 | 181.3 MB | #########8 | 99% 2025-05-07T19:45:24.9331712Z 2025-05-07T19:45:24.9331716Z 2025-05-07T19:45:24.9331720Z 2025-05-07T19:45:24.9331723Z 2025-05-07T19:45:24.9331727Z 2025-05-07T19:45:24.9331730Z 2025-05-07T19:45:24.9331734Z 2025-05-07T19:45:24.9331737Z 2025-05-07T19:45:24.9331755Z 2025-05-07T19:45:24.9331758Z 2025-05-07T19:45:24.9331762Z 2025-05-07T19:45:24.9331765Z 2025-05-07T19:45:24.9332530Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:24.9332830Z 2025-05-07T19:45:24.9332849Z 2025-05-07T19:45:24.9332863Z 2025-05-07T19:45:24.9332880Z 2025-05-07T19:45:24.9332883Z 2025-05-07T19:45:24.9332896Z 2025-05-07T19:45:24.9332899Z 2025-05-07T19:45:24.9332902Z 2025-05-07T19:45:24.9332906Z 2025-05-07T19:45:24.9332909Z 2025-05-07T19:45:24.9332913Z 2025-05-07T19:45:24.9332917Z 2025-05-07T19:45:25.0438701Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:25.0439050Z 2025-05-07T19:45:25.0439055Z 2025-05-07T19:45:25.0439059Z 2025-05-07T19:45:25.0439062Z 2025-05-07T19:45:25.0439066Z 2025-05-07T19:45:25.0439069Z 2025-05-07T19:45:25.0439072Z 2025-05-07T19:45:25.0439076Z 2025-05-07T19:45:25.0439079Z 2025-05-07T19:45:25.0439097Z 2025-05-07T19:45:25.0439101Z 2025-05-07T19:45:25.0439104Z 2025-05-07T19:45:25.0439107Z 2025-05-07T19:45:25.0439359Z 2025-05-07T19:45:25.0439673Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:25.0439975Z 2025-05-07T19:45:25.0439992Z 2025-05-07T19:45:25.0439995Z 2025-05-07T19:45:25.0439999Z 2025-05-07T19:45:25.0440003Z 2025-05-07T19:45:25.0440013Z 2025-05-07T19:45:25.0440025Z 2025-05-07T19:45:25.0440028Z 2025-05-07T19:45:25.0440031Z 2025-05-07T19:45:25.0440035Z 2025-05-07T19:45:25.0440038Z 2025-05-07T19:45:25.0440041Z 2025-05-07T19:45:25.0440045Z 2025-05-07T19:45:25.0440048Z 2025-05-07T19:45:25.1370170Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:25.1370504Z 2025-05-07T19:45:25.1370508Z 2025-05-07T19:45:25.1370512Z 2025-05-07T19:45:25.1370516Z 2025-05-07T19:45:25.1370519Z 2025-05-07T19:45:25.1370523Z 2025-05-07T19:45:25.1370526Z 2025-05-07T19:45:25.1370530Z 2025-05-07T19:45:25.1370533Z 2025-05-07T19:45:25.1370551Z 2025-05-07T19:45:25.1370555Z 2025-05-07T19:45:25.1370558Z 2025-05-07T19:45:25.1370562Z 2025-05-07T19:45:25.1372428Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:25.1372760Z 2025-05-07T19:45:25.1372774Z 2025-05-07T19:45:25.1372778Z 2025-05-07T19:45:25.1372793Z 2025-05-07T19:45:25.1372998Z 2025-05-07T19:45:25.1373002Z 2025-05-07T19:45:25.1373006Z 2025-05-07T19:45:25.1373009Z 2025-05-07T19:45:25.1373013Z 2025-05-07T19:45:25.1373016Z 2025-05-07T19:45:25.1373019Z 2025-05-07T19:45:25.1373022Z 2025-05-07T19:45:25.1373026Z 2025-05-07T19:45:25.2144131Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:25.2144493Z 2025-05-07T19:45:25.2144511Z 2025-05-07T19:45:25.2144515Z 2025-05-07T19:45:25.2144519Z 2025-05-07T19:45:25.2144522Z 2025-05-07T19:45:25.2144526Z 2025-05-07T19:45:25.2144529Z 2025-05-07T19:45:25.2144532Z 2025-05-07T19:45:25.2144536Z 2025-05-07T19:45:25.2144539Z 2025-05-07T19:45:25.2144543Z 2025-05-07T19:45:25.2144546Z 2025-05-07T19:45:25.2144550Z 2025-05-07T19:45:25.2144553Z 2025-05-07T19:45:25.2144556Z 2025-05-07T19:45:25.2144560Z 2025-05-07T19:45:25.2144868Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:25.2145384Z 2025-05-07T19:45:25.2145393Z 2025-05-07T19:45:25.2145397Z 2025-05-07T19:45:25.2145401Z 2025-05-07T19:45:25.2145404Z 2025-05-07T19:45:25.2145407Z 2025-05-07T19:45:25.2145411Z 2025-05-07T19:45:25.2145414Z 2025-05-07T19:45:25.2145418Z 2025-05-07T19:45:25.2145421Z 2025-05-07T19:45:25.2145425Z 2025-05-07T19:45:25.2145428Z 2025-05-07T19:45:25.2145432Z 2025-05-07T19:45:25.2145435Z 2025-05-07T19:45:25.2145439Z 2025-05-07T19:45:25.2145442Z 2025-05-07T19:45:25.7691522Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:25.7692493Z 2025-05-07T19:45:25.7692507Z 2025-05-07T19:45:25.7692542Z 2025-05-07T19:45:25.7692553Z 2025-05-07T19:45:25.7692563Z 2025-05-07T19:45:25.7692574Z 2025-05-07T19:45:25.7692584Z 2025-05-07T19:45:25.7692595Z 2025-05-07T19:45:25.7692605Z 2025-05-07T19:45:25.7692616Z 2025-05-07T19:45:25.7692656Z 2025-05-07T19:45:25.7692666Z 2025-05-07T19:45:25.7692677Z 2025-05-07T19:45:25.7692687Z 2025-05-07T19:45:25.7692713Z 2025-05-07T19:45:25.7699929Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:25.7700281Z 2025-05-07T19:45:25.7700293Z 2025-05-07T19:45:25.7700297Z 2025-05-07T19:45:25.7700300Z 2025-05-07T19:45:25.7700303Z 2025-05-07T19:45:25.7700307Z 2025-05-07T19:45:25.7700310Z 2025-05-07T19:45:25.7700313Z 2025-05-07T19:45:25.7700317Z 2025-05-07T19:45:25.7700320Z 2025-05-07T19:45:25.7700324Z 2025-05-07T19:45:25.7700327Z 2025-05-07T19:45:25.7700331Z 2025-05-07T19:45:25.7700334Z 2025-05-07T19:45:25.7700338Z 2025-05-07T19:45:25.7721259Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:25.7721776Z 2025-05-07T19:45:25.7721781Z 2025-05-07T19:45:25.8063018Z python-3.11.11 | 29.2 MB | ########## | 100%  2025-05-07T19:45:25.8063316Z 2025-05-07T19:45:25.8063773Z 2025-05-07T19:45:25.8063781Z 2025-05-07T19:45:25.8063786Z 2025-05-07T19:45:25.8063791Z 2025-05-07T19:45:25.8063815Z 2025-05-07T19:45:25.8063819Z 2025-05-07T19:45:25.8063824Z 2025-05-07T19:45:25.8063828Z 2025-05-07T19:45:25.8063833Z 2025-05-07T19:45:25.8063837Z 2025-05-07T19:45:25.8063842Z 2025-05-07T19:45:25.8063846Z 2025-05-07T19:45:25.8063851Z 2025-05-07T19:45:25.8063855Z 2025-05-07T19:45:25.8063860Z 2025-05-07T19:45:25.8063865Z 2025-05-07T19:45:25.8063870Z 2025-05-07T19:45:25.8064430Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:25.8064789Z 2025-05-07T19:45:25.8064792Z 2025-05-07T19:45:25.8064796Z 2025-05-07T19:45:25.8064799Z 2025-05-07T19:45:25.8064803Z 2025-05-07T19:45:25.8064806Z 2025-05-07T19:45:25.8064810Z 2025-05-07T19:45:25.8064813Z 2025-05-07T19:45:25.8064816Z 2025-05-07T19:45:25.8064820Z 2025-05-07T19:45:25.8064823Z 2025-05-07T19:45:25.8064827Z 2025-05-07T19:45:25.8064830Z 2025-05-07T19:45:25.8064841Z 2025-05-07T19:45:25.8064844Z 2025-05-07T19:45:25.8064848Z 2025-05-07T19:45:25.8064863Z 2025-05-07T19:45:25.8065059Z 2025-05-07T19:45:25.9844812Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:25.9845178Z 2025-05-07T19:45:25.9845183Z 2025-05-07T19:45:25.9845186Z 2025-05-07T19:45:25.9845190Z 2025-05-07T19:45:25.9845205Z 2025-05-07T19:45:25.9845209Z 2025-05-07T19:45:25.9845212Z 2025-05-07T19:45:25.9845216Z 2025-05-07T19:45:25.9845219Z 2025-05-07T19:45:25.9845222Z 2025-05-07T19:45:25.9845226Z 2025-05-07T19:45:25.9845229Z 2025-05-07T19:45:25.9845233Z 2025-05-07T19:45:25.9845236Z 2025-05-07T19:45:25.9845240Z 2025-05-07T19:45:25.9845243Z 2025-05-07T19:45:25.9845247Z 2025-05-07T19:45:25.9846084Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:25.9846404Z 2025-05-07T19:45:25.9846418Z 2025-05-07T19:45:25.9846422Z 2025-05-07T19:45:25.9846597Z 2025-05-07T19:45:25.9846601Z 2025-05-07T19:45:25.9846604Z 2025-05-07T19:45:25.9846608Z 2025-05-07T19:45:25.9846617Z 2025-05-07T19:45:25.9846621Z 2025-05-07T19:45:25.9846624Z 2025-05-07T19:45:25.9846628Z 2025-05-07T19:45:25.9846631Z 2025-05-07T19:45:25.9846635Z 2025-05-07T19:45:25.9846638Z 2025-05-07T19:45:25.9846642Z 2025-05-07T19:45:25.9846645Z 2025-05-07T19:45:25.9846649Z 2025-05-07T19:45:26.4846733Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:26.4847130Z 2025-05-07T19:45:26.4847137Z 2025-05-07T19:45:26.4847154Z 2025-05-07T19:45:27.0587582Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:27.2697469Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:27.2698212Z 2025-05-07T19:45:27.2698230Z 2025-05-07T19:45:27.2698238Z 2025-05-07T19:45:27.2698248Z 2025-05-07T19:45:27.2698311Z 2025-05-07T19:45:27.2698318Z 2025-05-07T19:45:27.2698370Z 2025-05-07T19:45:27.2698377Z 2025-05-07T19:45:27.2698383Z 2025-05-07T19:45:27.2698389Z 2025-05-07T19:45:27.2698412Z 2025-05-07T19:45:27.2698450Z 2025-05-07T19:45:27.2698454Z 2025-05-07T19:45:27.2698458Z 2025-05-07T19:45:27.2698461Z 2025-05-07T19:45:27.2698464Z 2025-05-07T19:45:27.2698468Z 2025-05-07T19:45:27.2698471Z 2025-05-07T19:45:27.2698475Z 2025-05-07T19:45:27.2699395Z ... (more hidden) ... 2025-05-07T19:45:27.2699756Z 2025-05-07T19:45:27.2699762Z 2025-05-07T19:45:27.2699765Z 2025-05-07T19:45:27.2699769Z 2025-05-07T19:45:27.2699773Z 2025-05-07T19:45:27.2699777Z 2025-05-07T19:45:27.2699780Z 2025-05-07T19:45:27.2699784Z 2025-05-07T19:45:27.2699787Z 2025-05-07T19:45:27.2699790Z 2025-05-07T19:45:27.2699794Z 2025-05-07T19:45:27.2699797Z 2025-05-07T19:45:27.2699801Z 2025-05-07T19:45:27.2699804Z 2025-05-07T19:45:27.2699808Z 2025-05-07T19:45:27.2699839Z 2025-05-07T19:45:27.2699842Z 2025-05-07T19:45:27.2699845Z 2025-05-07T19:45:27.2699865Z 2025-05-07T19:45:27.6484792Z ... (more hidden) ... 2025-05-07T19:45:27.6485761Z 2025-05-07T19:45:28.7663692Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:28.7668600Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:28.7669373Z 2025-05-07T19:45:28.7669387Z 2025-05-07T19:45:28.7669398Z 2025-05-07T19:45:28.7669410Z 2025-05-07T19:45:28.7669421Z 2025-05-07T19:45:28.7669432Z 2025-05-07T19:45:28.7669443Z 2025-05-07T19:45:28.7669454Z 2025-05-07T19:45:28.7669488Z 2025-05-07T19:45:28.7669499Z 2025-05-07T19:45:28.7669510Z 2025-05-07T19:45:28.7669521Z 2025-05-07T19:45:28.7669531Z 2025-05-07T19:45:28.7669541Z 2025-05-07T19:45:28.7669552Z 2025-05-07T19:45:28.7669562Z 2025-05-07T19:45:28.7669572Z 2025-05-07T19:45:28.7669582Z 2025-05-07T19:45:28.7669593Z 2025-05-07T19:45:28.7669826Z 2025-05-07T19:45:28.7670803Z  2025-05-07T19:45:28.7671774Z 2025-05-07T19:45:28.7672390Z 2025-05-07T19:45:28.7673371Z  2025-05-07T19:45:28.7674005Z 2025-05-07T19:45:28.7674017Z 2025-05-07T19:45:28.7674527Z  2025-05-07T19:45:28.7675156Z 2025-05-07T19:45:28.7675168Z 2025-05-07T19:45:28.7675178Z 2025-05-07T19:45:28.7675596Z  2025-05-07T19:45:28.7675815Z 2025-05-07T19:45:28.7675819Z 2025-05-07T19:45:28.7675822Z 2025-05-07T19:45:28.7675826Z 2025-05-07T19:45:28.7676020Z  2025-05-07T19:45:28.7676247Z 2025-05-07T19:45:28.7676250Z 2025-05-07T19:45:28.7676254Z 2025-05-07T19:45:28.7676258Z 2025-05-07T19:45:28.7676261Z 2025-05-07T19:45:28.7676442Z  2025-05-07T19:45:28.7676819Z 2025-05-07T19:45:28.7676823Z 2025-05-07T19:45:28.7676826Z 2025-05-07T19:45:28.7676829Z 2025-05-07T19:45:28.7676838Z 2025-05-07T19:45:28.7676841Z 2025-05-07T19:45:28.7677032Z  2025-05-07T19:45:28.7677284Z 2025-05-07T19:45:28.7677288Z 2025-05-07T19:45:28.7677291Z 2025-05-07T19:45:28.7677294Z 2025-05-07T19:45:28.7677298Z 2025-05-07T19:45:28.7677301Z 2025-05-07T19:45:28.7677304Z 2025-05-07T19:45:28.7677496Z  2025-05-07T19:45:28.7677730Z 2025-05-07T19:45:28.7677733Z 2025-05-07T19:45:28.7677737Z 2025-05-07T19:45:28.7677760Z 2025-05-07T19:45:28.7677764Z 2025-05-07T19:45:28.7677767Z 2025-05-07T19:45:28.7677770Z 2025-05-07T19:45:28.7677774Z 2025-05-07T19:45:28.7677969Z  2025-05-07T19:45:28.7678204Z 2025-05-07T19:45:28.7678208Z 2025-05-07T19:45:28.7678211Z 2025-05-07T19:45:28.7678219Z 2025-05-07T19:45:28.7678222Z 2025-05-07T19:45:28.7678243Z 2025-05-07T19:45:28.7678247Z 2025-05-07T19:45:28.7678254Z 2025-05-07T19:45:28.7678257Z 2025-05-07T19:45:28.7678454Z  2025-05-07T19:45:28.7678688Z 2025-05-07T19:45:28.7678692Z 2025-05-07T19:45:28.7678695Z 2025-05-07T19:45:28.7678699Z 2025-05-07T19:45:28.7678702Z 2025-05-07T19:45:28.7678706Z 2025-05-07T19:45:28.7678727Z 2025-05-07T19:45:28.7678730Z 2025-05-07T19:45:28.7678733Z 2025-05-07T19:45:28.7678737Z 2025-05-07T19:45:28.7678936Z  2025-05-07T19:45:28.7679174Z 2025-05-07T19:45:28.7679177Z 2025-05-07T19:45:28.7679181Z 2025-05-07T19:45:28.7679184Z 2025-05-07T19:45:28.7679187Z 2025-05-07T19:45:28.7679191Z 2025-05-07T19:45:28.7679210Z 2025-05-07T19:45:28.7679214Z 2025-05-07T19:45:28.7679217Z 2025-05-07T19:45:28.7679221Z 2025-05-07T19:45:28.7679224Z 2025-05-07T19:45:28.7679439Z  2025-05-07T19:45:28.7679681Z 2025-05-07T19:45:28.7679685Z 2025-05-07T19:45:28.7679688Z 2025-05-07T19:45:28.7679692Z 2025-05-07T19:45:28.7679695Z 2025-05-07T19:45:28.7679715Z 2025-05-07T19:45:28.7679719Z 2025-05-07T19:45:28.7679722Z 2025-05-07T19:45:28.7679726Z 2025-05-07T19:45:28.7679729Z 2025-05-07T19:45:28.7679732Z 2025-05-07T19:45:28.7679736Z 2025-05-07T19:45:28.7679942Z  2025-05-07T19:45:28.7680182Z 2025-05-07T19:45:28.7680185Z 2025-05-07T19:45:28.7680189Z 2025-05-07T19:45:28.7680209Z 2025-05-07T19:45:28.7680212Z 2025-05-07T19:45:28.7680215Z 2025-05-07T19:45:28.7680219Z 2025-05-07T19:45:28.7680222Z 2025-05-07T19:45:28.7680226Z 2025-05-07T19:45:28.7680229Z 2025-05-07T19:45:28.7680232Z 2025-05-07T19:45:28.7680236Z 2025-05-07T19:45:28.7680239Z 2025-05-07T19:45:28.7680447Z  2025-05-07T19:45:28.7680710Z 2025-05-07T19:45:28.7680777Z 2025-05-07T19:45:28.7680781Z 2025-05-07T19:45:28.7680785Z 2025-05-07T19:45:28.7680789Z 2025-05-07T19:45:28.7680792Z 2025-05-07T19:45:28.7680796Z 2025-05-07T19:45:28.7680799Z 2025-05-07T19:45:28.7680803Z 2025-05-07T19:45:28.7680806Z 2025-05-07T19:45:28.7680809Z 2025-05-07T19:45:28.7680813Z 2025-05-07T19:45:28.7680816Z 2025-05-07T19:45:28.7680820Z 2025-05-07T19:45:28.7681036Z  2025-05-07T19:45:28.7681301Z 2025-05-07T19:45:28.7681304Z 2025-05-07T19:45:28.7681308Z 2025-05-07T19:45:28.7681311Z 2025-05-07T19:45:28.7681315Z 2025-05-07T19:45:28.7681318Z 2025-05-07T19:45:28.7681322Z 2025-05-07T19:45:28.7681325Z 2025-05-07T19:45:28.7681329Z 2025-05-07T19:45:28.7681332Z 2025-05-07T19:45:28.7681336Z 2025-05-07T19:45:28.7681339Z 2025-05-07T19:45:28.7681343Z 2025-05-07T19:45:28.7681406Z 2025-05-07T19:45:28.7681409Z 2025-05-07T19:45:28.7681662Z  2025-05-07T19:45:28.7681939Z 2025-05-07T19:45:28.7681942Z 2025-05-07T19:45:28.7681946Z 2025-05-07T19:45:28.7681949Z 2025-05-07T19:45:28.7681953Z 2025-05-07T19:45:28.7681956Z 2025-05-07T19:45:28.7681960Z 2025-05-07T19:45:28.7681963Z 2025-05-07T19:45:28.7681967Z 2025-05-07T19:45:28.7681970Z 2025-05-07T19:45:28.7681974Z 2025-05-07T19:45:28.7681977Z 2025-05-07T19:45:28.7681981Z 2025-05-07T19:45:28.7681984Z 2025-05-07T19:45:28.7681987Z 2025-05-07T19:45:28.7681991Z 2025-05-07T19:45:28.7682217Z  2025-05-07T19:45:28.7682488Z 2025-05-07T19:45:28.7682492Z 2025-05-07T19:45:28.7682496Z 2025-05-07T19:45:28.7682499Z 2025-05-07T19:45:28.7682502Z 2025-05-07T19:45:28.7682506Z 2025-05-07T19:45:28.7682509Z 2025-05-07T19:45:28.7682517Z 2025-05-07T19:45:28.7682520Z 2025-05-07T19:45:28.7682524Z 2025-05-07T19:45:28.7682527Z 2025-05-07T19:45:28.7682534Z 2025-05-07T19:45:28.7682538Z 2025-05-07T19:45:28.7682541Z 2025-05-07T19:45:28.7682544Z 2025-05-07T19:45:28.7682548Z 2025-05-07T19:45:28.7682580Z 2025-05-07T19:45:28.7682934Z  2025-05-07T19:45:28.7683192Z 2025-05-07T19:45:28.7683195Z 2025-05-07T19:45:28.7683199Z 2025-05-07T19:45:28.7683203Z 2025-05-07T19:45:28.7683207Z 2025-05-07T19:45:28.7683210Z 2025-05-07T19:45:28.7683213Z 2025-05-07T19:45:28.7683217Z 2025-05-07T19:45:28.7683220Z 2025-05-07T19:45:28.7683244Z 2025-05-07T19:45:28.7683248Z 2025-05-07T19:45:28.7683251Z 2025-05-07T19:45:28.7683255Z 2025-05-07T19:45:28.7683258Z 2025-05-07T19:45:28.7683261Z 2025-05-07T19:45:28.7683265Z 2025-05-07T19:45:28.7683268Z 2025-05-07T19:45:28.7683271Z 2025-05-07T19:45:28.7683527Z  2025-05-07T19:45:28.7683820Z 2025-05-07T19:45:28.7683824Z 2025-05-07T19:45:28.7683931Z  2025-05-07T19:45:28.7684040Z 2025-05-07T19:45:28.7684043Z 2025-05-07T19:45:28.7684288Z  2025-05-07T19:45:28.7684420Z 2025-05-07T19:45:28.7684423Z 2025-05-07T19:45:28.7684427Z 2025-05-07T19:45:28.7684534Z  2025-05-07T19:45:28.7684647Z 2025-05-07T19:45:28.7684652Z 2025-05-07T19:45:28.7684675Z 2025-05-07T19:45:28.7684678Z 2025-05-07T19:45:28.7684787Z  2025-05-07T19:45:28.7684913Z 2025-05-07T19:45:28.7684917Z 2025-05-07T19:45:28.7684920Z 2025-05-07T19:45:28.7684924Z 2025-05-07T19:45:28.7684929Z 2025-05-07T19:45:28.7685058Z  2025-05-07T19:45:28.7685198Z 2025-05-07T19:45:28.7685202Z 2025-05-07T19:45:28.7685206Z 2025-05-07T19:45:28.7685209Z 2025-05-07T19:45:28.7685212Z 2025-05-07T19:45:28.7685216Z 2025-05-07T19:45:28.7685333Z  2025-05-07T19:45:28.7685491Z 2025-05-07T19:45:28.7685494Z 2025-05-07T19:45:28.7685501Z 2025-05-07T19:45:28.7685505Z 2025-05-07T19:45:28.7685509Z 2025-05-07T19:45:28.7685513Z 2025-05-07T19:45:28.7685588Z 2025-05-07T19:45:28.7685712Z  2025-05-07T19:45:28.7685878Z 2025-05-07T19:45:28.7685881Z 2025-05-07T19:45:28.7685884Z 2025-05-07T19:45:28.7685888Z 2025-05-07T19:45:28.7685891Z 2025-05-07T19:45:28.7685895Z 2025-05-07T19:45:28.7685898Z 2025-05-07T19:45:28.7685901Z 2025-05-07T19:45:28.7686024Z  2025-05-07T19:45:28.7686181Z 2025-05-07T19:45:28.7686185Z 2025-05-07T19:45:28.7686189Z 2025-05-07T19:45:28.7686208Z 2025-05-07T19:45:28.7686212Z 2025-05-07T19:45:28.7686215Z 2025-05-07T19:45:28.7686218Z 2025-05-07T19:45:28.7686222Z 2025-05-07T19:45:28.7686225Z 2025-05-07T19:45:28.7686351Z  2025-05-07T19:45:28.7686513Z 2025-05-07T19:45:28.7686517Z 2025-05-07T19:45:28.7686520Z 2025-05-07T19:45:28.7686524Z 2025-05-07T19:45:28.7686527Z 2025-05-07T19:45:28.7686550Z 2025-05-07T19:45:28.7686635Z 2025-05-07T19:45:28.7686639Z 2025-05-07T19:45:28.7686642Z 2025-05-07T19:45:28.7686646Z 2025-05-07T19:45:28.7686787Z  2025-05-07T19:45:28.7686957Z 2025-05-07T19:45:28.7686961Z 2025-05-07T19:45:28.7686965Z 2025-05-07T19:45:28.7686968Z 2025-05-07T19:45:28.7686972Z 2025-05-07T19:45:28.7686976Z 2025-05-07T19:45:28.7686997Z 2025-05-07T19:45:28.7687000Z 2025-05-07T19:45:28.7687004Z 2025-05-07T19:45:28.7687007Z 2025-05-07T19:45:28.7687011Z 2025-05-07T19:45:28.7687148Z  2025-05-07T19:45:28.7687329Z 2025-05-07T19:45:28.7687333Z 2025-05-07T19:45:28.7687336Z 2025-05-07T19:45:28.7687340Z 2025-05-07T19:45:28.7687344Z 2025-05-07T19:45:28.7687347Z 2025-05-07T19:45:28.7687367Z 2025-05-07T19:45:28.7687371Z 2025-05-07T19:45:28.7687374Z 2025-05-07T19:45:28.7687378Z 2025-05-07T19:45:28.7687381Z 2025-05-07T19:45:28.7687384Z 2025-05-07T19:45:28.7687518Z  2025-05-07T19:45:28.7687711Z 2025-05-07T19:45:28.7687720Z 2025-05-07T19:45:28.7687724Z 2025-05-07T19:45:28.7687728Z 2025-05-07T19:45:28.7687731Z 2025-05-07T19:45:28.7687755Z 2025-05-07T19:45:28.7687758Z 2025-05-07T19:45:28.7687762Z 2025-05-07T19:45:28.7687765Z 2025-05-07T19:45:28.7687768Z 2025-05-07T19:45:28.7687772Z 2025-05-07T19:45:28.7687775Z 2025-05-07T19:45:28.7687779Z 2025-05-07T19:45:28.7687920Z  2025-05-07T19:45:28.7688128Z 2025-05-07T19:45:28.7688131Z 2025-05-07T19:45:28.7688152Z 2025-05-07T19:45:28.7688156Z 2025-05-07T19:45:28.7688159Z 2025-05-07T19:45:28.7688162Z 2025-05-07T19:45:28.7688166Z 2025-05-07T19:45:28.7688169Z 2025-05-07T19:45:28.7688172Z 2025-05-07T19:45:28.7688176Z 2025-05-07T19:45:28.7688179Z 2025-05-07T19:45:28.7688183Z 2025-05-07T19:45:28.7688186Z 2025-05-07T19:45:28.7688189Z 2025-05-07T19:45:28.7688338Z  2025-05-07T19:45:28.7688563Z 2025-05-07T19:45:28.7688567Z 2025-05-07T19:45:28.7688570Z 2025-05-07T19:45:28.7688574Z 2025-05-07T19:45:28.7688580Z 2025-05-07T19:45:28.7688584Z 2025-05-07T19:45:28.7688587Z 2025-05-07T19:45:28.7688590Z 2025-05-07T19:45:28.7688597Z 2025-05-07T19:45:28.7688601Z 2025-05-07T19:45:28.7688604Z 2025-05-07T19:45:28.7688608Z 2025-05-07T19:45:28.7688611Z 2025-05-07T19:45:28.7688615Z 2025-05-07T19:45:28.7688618Z 2025-05-07T19:45:28.7688768Z  2025-05-07T19:45:28.7688991Z 2025-05-07T19:45:28.7688994Z 2025-05-07T19:45:28.7688998Z 2025-05-07T19:45:28.7689002Z 2025-05-07T19:45:28.7689005Z 2025-05-07T19:45:28.7689008Z 2025-05-07T19:45:28.7689012Z 2025-05-07T19:45:28.7689015Z 2025-05-07T19:45:28.7689019Z 2025-05-07T19:45:28.7689022Z 2025-05-07T19:45:28.7689025Z 2025-05-07T19:45:28.7689029Z 2025-05-07T19:45:28.7689032Z 2025-05-07T19:45:28.7689036Z 2025-05-07T19:45:28.7689039Z 2025-05-07T19:45:28.7689042Z 2025-05-07T19:45:28.7689218Z  2025-05-07T19:45:28.7689432Z 2025-05-07T19:45:28.7689436Z 2025-05-07T19:45:28.7689442Z 2025-05-07T19:45:28.7689446Z 2025-05-07T19:45:28.7689450Z 2025-05-07T19:45:28.7689453Z 2025-05-07T19:45:28.7689514Z 2025-05-07T19:45:28.7689518Z 2025-05-07T19:45:28.7689521Z 2025-05-07T19:45:28.7689524Z 2025-05-07T19:45:28.7689528Z 2025-05-07T19:45:28.7689531Z 2025-05-07T19:45:28.7689535Z 2025-05-07T19:45:28.7689554Z 2025-05-07T19:45:28.7689558Z 2025-05-07T19:45:28.7689561Z 2025-05-07T19:45:28.7689565Z 2025-05-07T19:45:28.7689742Z  2025-05-07T19:45:28.7689961Z 2025-05-07T19:45:28.7689965Z 2025-05-07T19:45:28.7689968Z 2025-05-07T19:45:28.7689972Z 2025-05-07T19:45:28.7689975Z 2025-05-07T19:45:28.7689978Z 2025-05-07T19:45:28.7689998Z 2025-05-07T19:45:28.7690001Z 2025-05-07T19:45:28.7690005Z 2025-05-07T19:45:28.7690008Z 2025-05-07T19:45:28.7690012Z 2025-05-07T19:45:28.7690015Z 2025-05-07T19:45:28.7690019Z 2025-05-07T19:45:28.7690022Z 2025-05-07T19:45:28.7690025Z 2025-05-07T19:45:28.7690029Z 2025-05-07T19:45:28.7690095Z 2025-05-07T19:45:28.7690099Z 2025-05-07T19:45:28.7690274Z  2025-05-07T19:45:28.7690516Z 2025-05-07T19:45:28.7690519Z 2025-05-07T19:45:28.7690619Z  2025-05-07T19:45:28.7690726Z 2025-05-07T19:45:28.7690729Z 2025-05-07T19:45:28.7690830Z  2025-05-07T19:45:28.7690959Z 2025-05-07T19:45:28.7690963Z 2025-05-07T19:45:28.7690966Z 2025-05-07T19:45:28.7691069Z  2025-05-07T19:45:28.7691182Z 2025-05-07T19:45:28.7691186Z 2025-05-07T19:45:28.7691205Z 2025-05-07T19:45:28.7691209Z 2025-05-07T19:45:28.7691314Z  2025-05-07T19:45:28.7691436Z 2025-05-07T19:45:28.7691439Z 2025-05-07T19:45:28.7691443Z 2025-05-07T19:45:28.7691446Z 2025-05-07T19:45:28.7691450Z 2025-05-07T19:45:28.7691575Z  2025-05-07T19:45:28.7691704Z 2025-05-07T19:45:28.7691708Z 2025-05-07T19:45:28.7691711Z 2025-05-07T19:45:28.7691714Z 2025-05-07T19:45:28.7691718Z 2025-05-07T19:45:28.7691722Z 2025-05-07T19:45:28.7691842Z  2025-05-07T19:45:28.7691994Z 2025-05-07T19:45:28.7691998Z 2025-05-07T19:45:28.7692001Z 2025-05-07T19:45:28.7692008Z 2025-05-07T19:45:28.7692012Z 2025-05-07T19:45:28.7692015Z 2025-05-07T19:45:28.7692019Z 2025-05-07T19:45:28.7692139Z  2025-05-07T19:45:28.7692301Z 2025-05-07T19:45:28.7692304Z 2025-05-07T19:45:28.7692308Z 2025-05-07T19:45:28.7692311Z 2025-05-07T19:45:28.7692315Z 2025-05-07T19:45:28.7692318Z 2025-05-07T19:45:28.7692321Z 2025-05-07T19:45:28.7692325Z 2025-05-07T19:45:28.7692448Z  2025-05-07T19:45:28.7692606Z 2025-05-07T19:45:28.7692609Z 2025-05-07T19:45:28.7692613Z 2025-05-07T19:45:28.7692635Z 2025-05-07T19:45:28.7692638Z 2025-05-07T19:45:28.7692641Z 2025-05-07T19:45:28.7692645Z 2025-05-07T19:45:28.7692649Z 2025-05-07T19:45:28.7692652Z 2025-05-07T19:45:28.7692777Z  2025-05-07T19:45:28.7692944Z 2025-05-07T19:45:28.7692948Z 2025-05-07T19:45:28.7692951Z 2025-05-07T19:45:28.7692954Z 2025-05-07T19:45:28.7692961Z 2025-05-07T19:45:28.7692982Z 2025-05-07T19:45:28.7692985Z 2025-05-07T19:45:28.7692989Z 2025-05-07T19:45:28.7692995Z 2025-05-07T19:45:28.7693000Z 2025-05-07T19:45:28.7693140Z  2025-05-07T19:45:28.7693313Z 2025-05-07T19:45:28.7693317Z 2025-05-07T19:45:28.7693320Z 2025-05-07T19:45:28.7693324Z 2025-05-07T19:45:28.7693328Z 2025-05-07T19:45:28.7693331Z 2025-05-07T19:45:28.7693353Z 2025-05-07T19:45:28.7693357Z 2025-05-07T19:45:28.7693360Z 2025-05-07T19:45:28.7693364Z 2025-05-07T19:45:28.7693368Z 2025-05-07T19:45:28.7693501Z  2025-05-07T19:45:28.7693685Z 2025-05-07T19:45:28.7693689Z 2025-05-07T19:45:28.7693692Z 2025-05-07T19:45:28.7693695Z 2025-05-07T19:45:28.7693699Z 2025-05-07T19:45:28.7693702Z 2025-05-07T19:45:28.7693723Z 2025-05-07T19:45:28.7693727Z 2025-05-07T19:45:28.7693730Z 2025-05-07T19:45:28.7693734Z 2025-05-07T19:45:28.7693737Z 2025-05-07T19:45:28.7693740Z 2025-05-07T19:45:28.7693877Z  2025-05-07T19:45:28.7694074Z 2025-05-07T19:45:28.7694078Z 2025-05-07T19:45:28.7694082Z 2025-05-07T19:45:28.7694143Z 2025-05-07T19:45:28.7694165Z 2025-05-07T19:45:28.7694169Z 2025-05-07T19:45:28.7694172Z 2025-05-07T19:45:28.7694176Z 2025-05-07T19:45:28.7694179Z 2025-05-07T19:45:28.7694182Z 2025-05-07T19:45:28.7694186Z 2025-05-07T19:45:28.7694189Z 2025-05-07T19:45:28.7694193Z 2025-05-07T19:45:28.7694335Z  2025-05-07T19:45:28.7694533Z 2025-05-07T19:45:28.7694536Z 2025-05-07T19:45:28.7694556Z 2025-05-07T19:45:28.7694559Z 2025-05-07T19:45:28.7694563Z 2025-05-07T19:45:28.7694664Z 2025-05-07T19:45:28.7694667Z 2025-05-07T19:45:28.7694671Z 2025-05-07T19:45:28.7694674Z 2025-05-07T19:45:28.7694678Z 2025-05-07T19:45:28.7694681Z 2025-05-07T19:45:28.7694685Z 2025-05-07T19:45:28.7694689Z 2025-05-07T19:45:28.7694692Z 2025-05-07T19:45:28.7694842Z  2025-05-07T19:45:28.7695064Z 2025-05-07T19:45:28.7695130Z 2025-05-07T19:45:28.7695134Z 2025-05-07T19:45:28.7695137Z 2025-05-07T19:45:28.7695141Z 2025-05-07T19:45:28.7695147Z 2025-05-07T19:45:28.7695151Z 2025-05-07T19:45:28.7695154Z 2025-05-07T19:45:28.7695158Z 2025-05-07T19:45:28.7695161Z 2025-05-07T19:45:28.7695164Z 2025-05-07T19:45:28.7695168Z 2025-05-07T19:45:28.7695171Z 2025-05-07T19:45:28.7695174Z 2025-05-07T19:45:28.7695178Z 2025-05-07T19:45:28.7695341Z  2025-05-07T19:45:28.7695567Z 2025-05-07T19:45:28.7695571Z 2025-05-07T19:45:28.7695574Z 2025-05-07T19:45:28.7695578Z 2025-05-07T19:45:28.7695582Z 2025-05-07T19:45:28.7695585Z 2025-05-07T19:45:28.7695589Z 2025-05-07T19:45:28.7695592Z 2025-05-07T19:45:28.7695596Z 2025-05-07T19:45:28.7695599Z 2025-05-07T19:45:28.7695602Z 2025-05-07T19:45:28.7695606Z 2025-05-07T19:45:28.7695609Z 2025-05-07T19:45:28.7695613Z 2025-05-07T19:45:28.7695616Z 2025-05-07T19:45:28.7695620Z 2025-05-07T19:45:28.7695790Z  2025-05-07T19:45:28.7696006Z 2025-05-07T19:45:28.7696010Z 2025-05-07T19:45:28.7696013Z 2025-05-07T19:45:28.7696020Z 2025-05-07T19:45:28.7696024Z 2025-05-07T19:45:28.7696027Z 2025-05-07T19:45:28.7696030Z 2025-05-07T19:45:28.7696034Z 2025-05-07T19:45:28.7696037Z 2025-05-07T19:45:28.7696041Z 2025-05-07T19:45:28.7696044Z 2025-05-07T19:45:28.7696064Z 2025-05-07T19:45:28.7696068Z 2025-05-07T19:45:28.7696071Z 2025-05-07T19:45:28.7696075Z 2025-05-07T19:45:28.7696078Z 2025-05-07T19:45:28.7696081Z 2025-05-07T19:45:28.7696245Z  2025-05-07T19:45:28.7696466Z 2025-05-07T19:45:28.7696470Z 2025-05-07T19:45:28.7696474Z 2025-05-07T19:45:28.7696477Z 2025-05-07T19:45:28.7696481Z 2025-05-07T19:45:28.7696501Z 2025-05-07T19:45:28.7696505Z 2025-05-07T19:45:28.7696508Z 2025-05-07T19:45:28.7696511Z 2025-05-07T19:45:28.7696515Z 2025-05-07T19:45:28.7696518Z 2025-05-07T19:45:28.7696522Z 2025-05-07T19:45:28.7696525Z 2025-05-07T19:45:28.7696529Z 2025-05-07T19:45:28.7696535Z 2025-05-07T19:45:28.7696539Z 2025-05-07T19:45:28.7696543Z 2025-05-07T19:45:28.7696546Z 2025-05-07T19:45:28.7696720Z  2025-05-07T19:45:28.7696961Z 2025-05-07T19:45:28.7696965Z 2025-05-07T19:45:28.7697063Z  2025-05-07T19:45:28.7697172Z 2025-05-07T19:45:28.7697175Z 2025-05-07T19:45:28.7697294Z  2025-05-07T19:45:28.7697406Z 2025-05-07T19:45:28.7697409Z 2025-05-07T19:45:28.7697413Z 2025-05-07T19:45:28.7697520Z  2025-05-07T19:45:28.7697655Z 2025-05-07T19:45:28.7697659Z 2025-05-07T19:45:28.7697662Z 2025-05-07T19:45:28.7697666Z 2025-05-07T19:45:28.7697772Z  2025-05-07T19:45:28.7697894Z 2025-05-07T19:45:28.7697897Z 2025-05-07T19:45:28.7697901Z 2025-05-07T19:45:28.7697904Z 2025-05-07T19:45:28.7697908Z 2025-05-07T19:45:28.7698036Z  2025-05-07T19:45:28.7698165Z 2025-05-07T19:45:28.7698168Z 2025-05-07T19:45:28.7698172Z 2025-05-07T19:45:28.7698175Z 2025-05-07T19:45:28.7698178Z 2025-05-07T19:45:28.7698185Z 2025-05-07T19:45:28.7698298Z  2025-05-07T19:45:28.7698448Z 2025-05-07T19:45:28.7698508Z 2025-05-07T19:45:28.7698512Z 2025-05-07T19:45:28.7698515Z 2025-05-07T19:45:28.7698518Z 2025-05-07T19:45:28.7698522Z 2025-05-07T19:45:28.7698525Z 2025-05-07T19:45:28.7698640Z  2025-05-07T19:45:28.7698802Z 2025-05-07T19:45:28.7698805Z 2025-05-07T19:45:28.7698809Z 2025-05-07T19:45:28.7698812Z 2025-05-07T19:45:28.7698816Z 2025-05-07T19:45:28.7698819Z 2025-05-07T19:45:28.7698822Z 2025-05-07T19:45:28.7698825Z 2025-05-07T19:45:28.7698949Z  2025-05-07T19:45:28.7699105Z 2025-05-07T19:45:28.7699109Z 2025-05-07T19:45:28.7699130Z 2025-05-07T19:45:28.7699134Z 2025-05-07T19:45:28.7699137Z 2025-05-07T19:45:28.7699140Z 2025-05-07T19:45:28.7699144Z 2025-05-07T19:45:28.7699147Z 2025-05-07T19:45:28.7699151Z 2025-05-07T19:45:28.7699275Z  2025-05-07T19:45:28.7699436Z 2025-05-07T19:45:28.7699440Z 2025-05-07T19:45:28.7699497Z 2025-05-07T19:45:28.7699501Z 2025-05-07T19:45:28.7699520Z 2025-05-07T19:45:28.7699524Z 2025-05-07T19:45:28.7699531Z 2025-05-07T19:45:28.7699534Z 2025-05-07T19:45:28.7699537Z 2025-05-07T19:45:28.7699541Z 2025-05-07T19:45:28.7699669Z  2025-05-07T19:45:28.7699839Z 2025-05-07T19:45:28.7699842Z 2025-05-07T19:45:28.7699847Z 2025-05-07T19:45:28.7699850Z 2025-05-07T19:45:28.7699854Z 2025-05-07T19:45:28.7699874Z 2025-05-07T19:45:28.7699877Z 2025-05-07T19:45:28.7699881Z 2025-05-07T19:45:28.7699884Z 2025-05-07T19:45:28.7699887Z 2025-05-07T19:45:28.7699891Z 2025-05-07T19:45:28.7700021Z  2025-05-07T19:45:28.7700205Z 2025-05-07T19:45:28.7700208Z 2025-05-07T19:45:28.7700212Z 2025-05-07T19:45:28.7700217Z 2025-05-07T19:45:28.7700238Z 2025-05-07T19:45:28.7700241Z 2025-05-07T19:45:28.7700244Z 2025-05-07T19:45:28.7700248Z 2025-05-07T19:45:28.7700251Z 2025-05-07T19:45:28.7700254Z 2025-05-07T19:45:28.7700258Z 2025-05-07T19:45:28.7700264Z 2025-05-07T19:45:28.7700401Z  2025-05-07T19:45:28.7700592Z 2025-05-07T19:45:28.7700600Z 2025-05-07T19:45:28.7700604Z 2025-05-07T19:45:28.7700625Z 2025-05-07T19:45:28.7700628Z 2025-05-07T19:45:28.7700631Z 2025-05-07T19:45:28.7700635Z 2025-05-07T19:45:28.7700638Z 2025-05-07T19:45:28.7700641Z 2025-05-07T19:45:28.7700645Z 2025-05-07T19:45:28.7700648Z 2025-05-07T19:45:28.7700652Z 2025-05-07T19:45:28.7700655Z 2025-05-07T19:45:28.7700796Z  2025-05-07T19:45:28.7700996Z 2025-05-07T19:45:28.7701017Z 2025-05-07T19:45:28.7701021Z 2025-05-07T19:45:28.7701024Z 2025-05-07T19:45:28.7701028Z 2025-05-07T19:45:28.7701031Z 2025-05-07T19:45:28.7701034Z 2025-05-07T19:45:28.7701038Z 2025-05-07T19:45:28.7701041Z 2025-05-07T19:45:28.7701045Z 2025-05-07T19:45:28.7701048Z 2025-05-07T19:45:28.7701052Z 2025-05-07T19:45:28.7701055Z 2025-05-07T19:45:28.7701059Z 2025-05-07T19:45:28.7701205Z  2025-05-07T19:45:28.7701431Z 2025-05-07T19:45:28.7701435Z 2025-05-07T19:45:28.7701438Z 2025-05-07T19:45:28.7701445Z 2025-05-07T19:45:28.7701449Z 2025-05-07T19:45:28.7701452Z 2025-05-07T19:45:28.7701456Z 2025-05-07T19:45:28.7701459Z 2025-05-07T19:45:28.7701463Z 2025-05-07T19:45:28.7701466Z 2025-05-07T19:45:28.7701470Z 2025-05-07T19:45:28.7701473Z 2025-05-07T19:45:28.7701477Z 2025-05-07T19:45:28.7701480Z 2025-05-07T19:45:28.7701483Z 2025-05-07T19:45:28.7701651Z  2025-05-07T19:45:28.7701861Z 2025-05-07T19:45:28.7701865Z 2025-05-07T19:45:28.7701869Z 2025-05-07T19:45:28.7701872Z 2025-05-07T19:45:28.7701876Z 2025-05-07T19:45:28.7701879Z 2025-05-07T19:45:28.7701882Z 2025-05-07T19:45:28.7701886Z 2025-05-07T19:45:28.7701889Z 2025-05-07T19:45:28.7701893Z 2025-05-07T19:45:28.7701896Z 2025-05-07T19:45:28.7701900Z 2025-05-07T19:45:28.7701903Z 2025-05-07T19:45:28.7701906Z 2025-05-07T19:45:28.7701911Z 2025-05-07T19:45:28.7701914Z 2025-05-07T19:45:28.7702091Z  2025-05-07T19:45:28.7702308Z 2025-05-07T19:45:28.7702369Z 2025-05-07T19:45:28.7702373Z 2025-05-07T19:45:28.7702376Z 2025-05-07T19:45:28.7702380Z 2025-05-07T19:45:28.7702383Z 2025-05-07T19:45:28.7702386Z 2025-05-07T19:45:28.7702390Z 2025-05-07T19:45:28.7702393Z 2025-05-07T19:45:28.7702397Z 2025-05-07T19:45:28.7702400Z 2025-05-07T19:45:28.7702420Z 2025-05-07T19:45:28.7702424Z 2025-05-07T19:45:28.7702427Z 2025-05-07T19:45:28.7702430Z 2025-05-07T19:45:28.7702434Z 2025-05-07T19:45:28.7702437Z 2025-05-07T19:45:28.7702600Z  2025-05-07T19:45:28.7702819Z 2025-05-07T19:45:28.7702823Z 2025-05-07T19:45:28.7702827Z 2025-05-07T19:45:28.7702832Z 2025-05-07T19:45:28.7702835Z 2025-05-07T19:45:28.7702856Z 2025-05-07T19:45:28.7702859Z 2025-05-07T19:45:28.7702863Z 2025-05-07T19:45:28.7702866Z 2025-05-07T19:45:28.7702869Z 2025-05-07T19:45:28.7702873Z 2025-05-07T19:45:28.7702950Z 2025-05-07T19:45:28.7702953Z 2025-05-07T19:45:28.7702957Z 2025-05-07T19:45:28.7702960Z 2025-05-07T19:45:28.7702967Z 2025-05-07T19:45:28.7702971Z 2025-05-07T19:45:28.7702974Z 2025-05-07T19:45:28.7703144Z  2025-05-07T19:45:28.7703386Z 2025-05-07T19:45:28.7703390Z 2025-05-07T19:45:28.7703492Z  2025-05-07T19:45:28.7703600Z 2025-05-07T19:45:28.7703603Z 2025-05-07T19:45:28.7703722Z  2025-05-07T19:45:28.7703833Z 2025-05-07T19:45:28.7703837Z 2025-05-07T19:45:28.7703840Z 2025-05-07T19:45:28.7703943Z  2025-05-07T19:45:28.7704070Z 2025-05-07T19:45:28.7704073Z 2025-05-07T19:45:28.7704077Z 2025-05-07T19:45:28.7704080Z 2025-05-07T19:45:28.7704186Z  2025-05-07T19:45:28.7704303Z 2025-05-07T19:45:28.7704307Z 2025-05-07T19:45:28.7704310Z 2025-05-07T19:45:28.7704314Z 2025-05-07T19:45:28.7704317Z 2025-05-07T19:45:28.7704442Z  2025-05-07T19:45:28.7704570Z 2025-05-07T19:45:28.7704574Z 2025-05-07T19:45:28.7704580Z 2025-05-07T19:45:28.7704584Z 2025-05-07T19:45:28.7704587Z 2025-05-07T19:45:28.7704590Z 2025-05-07T19:45:28.7704710Z  2025-05-07T19:45:28.7704859Z 2025-05-07T19:45:28.7704863Z 2025-05-07T19:45:28.7704866Z 2025-05-07T19:45:28.7704870Z 2025-05-07T19:45:28.7704873Z 2025-05-07T19:45:28.7704877Z 2025-05-07T19:45:28.7704880Z 2025-05-07T19:45:28.7704993Z  2025-05-07T19:45:28.7705151Z 2025-05-07T19:45:28.7705155Z 2025-05-07T19:45:28.7705158Z 2025-05-07T19:45:28.7705161Z 2025-05-07T19:45:28.7705165Z 2025-05-07T19:45:28.7705169Z 2025-05-07T19:45:28.7705172Z 2025-05-07T19:45:28.7705176Z 2025-05-07T19:45:28.7705294Z  2025-05-07T19:45:28.7705447Z 2025-05-07T19:45:28.7705450Z 2025-05-07T19:45:28.7705470Z 2025-05-07T19:45:28.7705473Z 2025-05-07T19:45:28.7705477Z 2025-05-07T19:45:28.7705480Z 2025-05-07T19:45:28.7705483Z 2025-05-07T19:45:28.7705487Z 2025-05-07T19:45:28.7705490Z 2025-05-07T19:45:28.7705614Z  2025-05-07T19:45:28.7705780Z 2025-05-07T19:45:28.7705783Z 2025-05-07T19:45:28.7705787Z 2025-05-07T19:45:28.7705794Z 2025-05-07T19:45:28.7705813Z 2025-05-07T19:45:28.7705817Z 2025-05-07T19:45:28.7705820Z 2025-05-07T19:45:28.7705824Z 2025-05-07T19:45:28.7705827Z 2025-05-07T19:45:28.7705830Z 2025-05-07T19:45:28.7705958Z  2025-05-07T19:45:28.7706128Z 2025-05-07T19:45:28.7706132Z 2025-05-07T19:45:28.7706135Z 2025-05-07T19:45:28.7706138Z 2025-05-07T19:45:28.7706158Z 2025-05-07T19:45:28.7706162Z 2025-05-07T19:45:28.7706165Z 2025-05-07T19:45:28.7706168Z 2025-05-07T19:45:28.7706172Z 2025-05-07T19:45:28.7706175Z 2025-05-07T19:45:28.7706179Z 2025-05-07T19:45:28.7706313Z  2025-05-07T19:45:28.7706495Z 2025-05-07T19:45:28.7706498Z 2025-05-07T19:45:28.7706502Z 2025-05-07T19:45:28.7706505Z 2025-05-07T19:45:28.7706525Z 2025-05-07T19:45:28.7706528Z 2025-05-07T19:45:28.7706532Z 2025-05-07T19:45:28.7706535Z 2025-05-07T19:45:28.7706543Z 2025-05-07T19:45:28.7706546Z 2025-05-07T19:45:28.7706550Z 2025-05-07T19:45:28.7706553Z 2025-05-07T19:45:28.7706739Z  2025-05-07T19:45:28.7706929Z 2025-05-07T19:45:28.7706932Z 2025-05-07T19:45:28.7706936Z 2025-05-07T19:45:28.7706956Z 2025-05-07T19:45:28.7706960Z 2025-05-07T19:45:28.7706963Z 2025-05-07T19:45:28.7706967Z 2025-05-07T19:45:28.7706970Z 2025-05-07T19:45:28.7706973Z 2025-05-07T19:45:28.7706977Z 2025-05-07T19:45:28.7706980Z 2025-05-07T19:45:28.7706984Z 2025-05-07T19:45:28.7706987Z 2025-05-07T19:45:28.7707126Z  2025-05-07T19:45:28.7707326Z 2025-05-07T19:45:28.7707346Z 2025-05-07T19:45:28.7707349Z 2025-05-07T19:45:28.7707352Z 2025-05-07T19:45:28.7707356Z 2025-05-07T19:45:28.7707359Z 2025-05-07T19:45:28.7707363Z 2025-05-07T19:45:28.7707366Z 2025-05-07T19:45:28.7707369Z 2025-05-07T19:45:28.7707373Z 2025-05-07T19:45:28.7707376Z 2025-05-07T19:45:28.7707380Z 2025-05-07T19:45:28.7707383Z 2025-05-07T19:45:28.7708531Z 2025-05-07T19:45:28.7708691Z  2025-05-07T19:45:28.7708916Z 2025-05-07T19:45:28.7708924Z 2025-05-07T19:45:28.7708928Z 2025-05-07T19:45:28.7708932Z 2025-05-07T19:45:28.7708935Z 2025-05-07T19:45:28.7708939Z 2025-05-07T19:45:28.7708942Z 2025-05-07T19:45:28.7708947Z 2025-05-07T19:45:28.7708950Z 2025-05-07T19:45:28.7708953Z 2025-05-07T19:45:28.7708957Z 2025-05-07T19:45:28.7708960Z 2025-05-07T19:45:28.7708963Z 2025-05-07T19:45:28.7708967Z 2025-05-07T19:45:28.7708970Z 2025-05-07T19:45:28.7709151Z  2025-05-07T19:45:28.7709359Z 2025-05-07T19:45:28.7709363Z 2025-05-07T19:45:28.7709366Z 2025-05-07T19:45:28.7709370Z 2025-05-07T19:45:28.7709373Z 2025-05-07T19:45:28.7709377Z 2025-05-07T19:45:28.7709380Z 2025-05-07T19:45:28.7709384Z 2025-05-07T19:45:28.7709387Z 2025-05-07T19:45:28.7709391Z 2025-05-07T19:45:28.7709394Z 2025-05-07T19:45:28.7709398Z 2025-05-07T19:45:28.7709402Z 2025-05-07T19:45:28.7709409Z 2025-05-07T19:45:28.7709412Z 2025-05-07T19:45:28.7709416Z 2025-05-07T19:45:28.7709594Z  2025-05-07T19:45:28.7709813Z 2025-05-07T19:45:28.7709816Z 2025-05-07T19:45:28.7709820Z 2025-05-07T19:45:28.7709823Z 2025-05-07T19:45:28.7709827Z 2025-05-07T19:45:28.7709830Z 2025-05-07T19:45:28.7709834Z 2025-05-07T19:45:28.7709837Z 2025-05-07T19:45:28.7709841Z 2025-05-07T19:45:28.7709844Z 2025-05-07T19:45:28.7709848Z 2025-05-07T19:45:28.7709868Z 2025-05-07T19:45:28.7709872Z 2025-05-07T19:45:28.7709875Z 2025-05-07T19:45:28.7709878Z 2025-05-07T19:45:28.7709882Z 2025-05-07T19:45:28.7709885Z 2025-05-07T19:45:28.7710048Z  2025-05-07T19:45:28.7710270Z 2025-05-07T19:45:28.7710274Z 2025-05-07T19:45:28.7710278Z 2025-05-07T19:45:28.7710281Z 2025-05-07T19:45:28.7710284Z 2025-05-07T19:45:28.7710306Z 2025-05-07T19:45:28.7710309Z 2025-05-07T19:45:28.7710313Z 2025-05-07T19:45:28.7710316Z 2025-05-07T19:45:28.7710323Z 2025-05-07T19:45:28.7710326Z 2025-05-07T19:45:28.7710330Z 2025-05-07T19:45:28.7710333Z 2025-05-07T19:45:28.7710339Z 2025-05-07T19:45:28.7710343Z 2025-05-07T19:45:28.7710346Z 2025-05-07T19:45:28.7710368Z 2025-05-07T19:45:28.7710372Z 2025-05-07T19:45:28.7710542Z  2025-05-07T19:45:28.7710783Z 2025-05-07T19:45:28.7710787Z 2025-05-07T19:45:28.7710885Z  2025-05-07T19:45:28.7710996Z 2025-05-07T19:45:28.7710999Z 2025-05-07T19:45:28.7711118Z  2025-05-07T19:45:28.7711229Z 2025-05-07T19:45:28.7711232Z 2025-05-07T19:45:28.7711236Z 2025-05-07T19:45:28.7711345Z  done 2025-05-07T19:45:29.0872261Z Preparing transaction: | / - done 2025-05-07T19:45:32.7707543Z Verifying transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:45:35.4944224Z Executing transaction: \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:45:35.9074516Z [INSTALL] Adding symlink librhash.so.0, which is needed by CMake ... 2025-05-07T19:45:37.7527855Z + ln -s /github/home/miniconda/envs/build_binary/lib/librhash.so /github/home/miniconda/envs/build_binary/lib/librhash.so.0 2025-05-07T19:45:37.7528560Z 2025-05-07T19:45:37.7544700Z 2025-05-07T19:45:37.7566562Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install build 2025-05-07T19:45:40.0933335Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:45:40.0934878Z 2025-05-07T19:45:40.0934975Z Collecting build 2025-05-07T19:45:40.0935345Z Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB) 2025-05-07T19:45:40.0936448Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from build) (25.0) 2025-05-07T19:45:40.0937194Z Collecting pyproject_hooks (from build) 2025-05-07T19:45:40.0937644Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB) 2025-05-07T19:45:40.0938139Z Downloading build-1.2.2.post1-py3-none-any.whl (22 kB) 2025-05-07T19:45:40.0938594Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB) 2025-05-07T19:45:40.0939032Z Installing collected packages: pyproject_hooks, build 2025-05-07T19:45:40.0939312Z 2025-05-07T19:45:40.0939509Z Successfully installed build-1.2.2.post1 pyproject_hooks-1.2.0 2025-05-07T19:45:40.0939816Z 2025-05-07T19:45:41.9824315Z /github/home/miniconda/envs/build_binary/bin/make 2025-05-07T19:45:41.9824670Z 2025-05-07T19:45:42.0409252Z [CHECK] Binary make found in PATH 2025-05-07T19:45:43.8454011Z /github/home/miniconda/envs/build_binary/bin/cmake 2025-05-07T19:45:43.8454342Z 2025-05-07T19:45:43.9173093Z [CHECK] Binary cmake found in PATH 2025-05-07T19:45:45.7422688Z /github/home/miniconda/envs/build_binary/bin/ninja 2025-05-07T19:45:45.7423557Z 2025-05-07T19:45:45.8007099Z [CHECK] Binary ninja found in PATH 2025-05-07T19:45:47.7086461Z [CHECK] Python (sub-)package 'click' found ... 2025-05-07T19:45:49.7517387Z [CHECK] Python (sub-)package 'hypothesis' found ... 2025-05-07T19:45:51.6813330Z [CHECK] Python (sub-)package 'jinja2' found ... 2025-05-07T19:45:53.6694823Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:45:55.5276806Z [CHECK] Python (sub-)package 'wheel' found ... 2025-05-07T19:45:55.5277236Z [INSTALL] Successfully installed all the build tools 2025-05-07T19:45:55.5357852Z ##[group]Run . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:55.5358340Z . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:55.5359039Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:55.5359401Z env: 2025-05-07T19:45:55.5359683Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:55.5360033Z BUILD_ENV: build_binary 2025-05-07T19:45:55.5360328Z BUILD_TARGET: genai 2025-05-07T19:45:55.5360624Z BUILD_VARIANT: cuda 2025-05-07T19:45:55.5360911Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:55.5361193Z ##[endgroup] 2025-05-07T19:45:55.9875002Z ################################################################################ 2025-05-07T19:45:55.9876004Z # Install CUDA 2025-05-07T19:45:55.9876591Z # 2025-05-07T19:45:55.9887959Z # [2025-05-07T19:45:55.988Z] + install_cuda build_binary 12.8.0 2025-05-07T19:45:55.9888394Z ################################################################################ 2025-05-07T19:45:55.9888716Z 2025-05-07T19:45:55.9906584Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:56.0764237Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:56.0765286Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:45:56.0767790Z + conda clean --packages --tarball -y 2025-05-07T19:45:56.0768042Z 2025-05-07T19:45:56.6669653Z Will remove 147 (628.1 MB) tarball(s). 2025-05-07T19:45:56.6670601Z Will remove 21 (102.9 MB) package(s). 2025-05-07T19:45:56.7252973Z 2025-05-07T19:45:56.7258329Z + conda clean --all -y 2025-05-07T19:45:56.7258841Z 2025-05-07T19:45:57.3525647Z There are no unused tarball(s) to remove. 2025-05-07T19:45:57.3526012Z Will remove 1 index cache(s). 2025-05-07T19:45:57.3526325Z There are no unused package(s) to remove. 2025-05-07T19:45:57.3526648Z There are no tempfile(s) to remove. 2025-05-07T19:45:57.3527084Z There are no logfile(s) to remove. 2025-05-07T19:45:57.4093741Z 2025-05-07T19:45:57.4103264Z [INSTALL] Installing CUDA 12.8.0 ... 2025-05-07T19:45:57.4128256Z [EXEC] [ATTEMPT 0/3] + conda install --force-reinstall -n build_binary -c conda-forge --override-channels -y cuda=12.8.0 2025-05-07T19:45:58.2605919Z Channels: 2025-05-07T19:45:58.2606573Z - conda-forge 2025-05-07T19:45:58.2607188Z Platform: linux-64 2025-05-07T19:46:08.0925928Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:46:09.6486863Z Solving environment: / - \ | done 2025-05-07T19:46:09.7846238Z 2025-05-07T19:46:09.7846556Z ## Package Plan ## 2025-05-07T19:46:09.7846741Z 2025-05-07T19:46:09.7847097Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:46:09.7847607Z 2025-05-07T19:46:09.7847795Z added / updated specs: 2025-05-07T19:46:09.7848768Z - cuda=12.8.0 2025-05-07T19:46:09.7848904Z 2025-05-07T19:46:09.7848927Z 2025-05-07T19:46:09.7849065Z The following packages will be downloaded: 2025-05-07T19:46:09.7849307Z 2025-05-07T19:46:09.7849446Z package | build 2025-05-07T19:46:09.7849779Z ---------------------------|----------------- 2025-05-07T19:46:09.7850171Z attr-2.5.1 | h166bdaf_1 69 KB conda-forge 2025-05-07T19:46:09.7850621Z binutils-2.40 | h4852527_7 31 KB conda-forge 2025-05-07T19:46:09.7851080Z c-compiler-1.5.2 | h0b41bf4_0 6 KB conda-forge 2025-05-07T19:46:09.7851698Z cuda-12.8.0 | ha804496_0 26 KB conda-forge 2025-05-07T19:46:09.7852150Z cuda-cccl_linux-64-12.8.55 | ha770c72_1 1.0 MB conda-forge 2025-05-07T19:46:09.7852708Z cuda-command-line-tools-12.8.0| ha770c72_0 20 KB conda-forge 2025-05-07T19:46:09.7853232Z cuda-compiler-12.8.0 | hbad6d8a_0 20 KB conda-forge 2025-05-07T19:46:09.7853751Z cuda-crt-dev_linux-64-12.8.61| ha770c72_1 90 KB conda-forge 2025-05-07T19:46:09.7854583Z cuda-crt-tools-12.8.61 | ha770c72_1 27 KB conda-forge 2025-05-07T19:46:09.7855191Z cuda-cudart-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:09.7855684Z cuda-cudart-dev-12.8.57 | h5888daf_1 23 KB conda-forge 2025-05-07T19:46:09.7856236Z cuda-cudart-dev_linux-64-12.8.57| h3f2d84a_1 377 KB conda-forge 2025-05-07T19:46:09.7856783Z cuda-cudart-static-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:09.7857355Z cuda-cudart-static_linux-64-12.8.57| h3f2d84a_1 950 KB conda-forge 2025-05-07T19:46:09.7857924Z cuda-cudart_linux-64-12.8.57| h3f2d84a_1 188 KB conda-forge 2025-05-07T19:46:09.7858434Z cuda-cuobjdump-12.8.55 | hbd13f7d_0 227 KB conda-forge 2025-05-07T19:46:09.7859050Z cuda-cupti-12.8.57 | hbd13f7d_0 1.8 MB conda-forge 2025-05-07T19:46:09.7859629Z cuda-cupti-dev-12.8.57 | h5888daf_0 4.0 MB conda-forge 2025-05-07T19:46:09.7860129Z cuda-cuxxfilt-12.8.55 | hbd13f7d_0 211 KB conda-forge 2025-05-07T19:46:09.7860624Z cuda-driver-dev-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:09.7861131Z cuda-driver-dev_linux-64-12.8.90| h3f2d84a_1 36 KB conda-forge 2025-05-07T19:46:09.7861636Z cuda-gdb-12.8.55 | h50b4baa_0 353 KB conda-forge 2025-05-07T19:46:09.7862086Z cuda-libraries-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:09.7862605Z cuda-libraries-dev-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:09.7863076Z cuda-nsight-12.8.55 | h7938cbb_0 113.2 MB conda-forge 2025-05-07T19:46:09.7863538Z cuda-nvcc-12.8.61 | hcdd1206_0 23 KB conda-forge 2025-05-07T19:46:09.7864025Z cuda-nvcc-dev_linux-64-12.8.61| he91c749_1 12.7 MB conda-forge 2025-05-07T19:46:09.7864508Z cuda-nvcc-impl-12.8.61 | h85509e4_1 25 KB conda-forge 2025-05-07T19:46:09.7865003Z cuda-nvcc-tools-12.8.61 | he02047a_1 24.5 MB conda-forge 2025-05-07T19:46:09.7865474Z cuda-nvcc_linux-64-12.8.61 | h04802cd_0 25 KB conda-forge 2025-05-07T19:46:09.7866085Z cuda-nvdisasm-12.8.55 | hbd13f7d_0 4.9 MB conda-forge 2025-05-07T19:46:09.7866539Z cuda-nvml-dev-12.8.55 | hbd13f7d_0 134 KB conda-forge 2025-05-07T19:46:09.7867024Z cuda-nvprof-12.8.57 | hbd13f7d_0 2.5 MB conda-forge 2025-05-07T19:46:09.7868109Z cuda-nvprune-12.8.55 | hbd13f7d_0 68 KB conda-forge 2025-05-07T19:46:09.7868590Z cuda-nvrtc-12.8.61 | hbd13f7d_0 63.1 MB conda-forge 2025-05-07T19:46:09.7869107Z cuda-nvrtc-dev-12.8.61 | h5888daf_0 34 KB conda-forge 2025-05-07T19:46:09.7869596Z cuda-nvtx-12.8.55 | hbd13f7d_0 31 KB conda-forge 2025-05-07T19:46:09.7870123Z cuda-nvvm-dev_linux-64-12.8.61| ha770c72_1 25 KB conda-forge 2025-05-07T19:46:09.7870676Z cuda-nvvm-impl-12.8.61 | he02047a_1 20.8 MB conda-forge 2025-05-07T19:46:09.7871177Z cuda-nvvm-tools-12.8.61 | he02047a_1 23.5 MB conda-forge 2025-05-07T19:46:09.7892002Z cuda-nvvp-12.8.57 | hbd13f7d_0 112.4 MB conda-forge 2025-05-07T19:46:09.7892518Z cuda-opencl-12.8.55 | hbd13f7d_0 29 KB conda-forge 2025-05-07T19:46:09.7893012Z cuda-opencl-dev-12.8.55 | h5888daf_0 95 KB conda-forge 2025-05-07T19:46:09.7893548Z cuda-profiler-api-12.8.55 | h7938cbb_0 22 KB conda-forge 2025-05-07T19:46:09.7894046Z cuda-runtime-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:46:09.7894558Z cuda-sanitizer-api-12.8.55 | hbd13f7d_0 8.8 MB conda-forge 2025-05-07T19:46:09.7895447Z cuda-toolkit-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:46:09.7895894Z cuda-tools-12.8.0 | ha770c72_0 19 KB conda-forge 2025-05-07T19:46:09.7896354Z cuda-version-12.8 | h5d125a7_3 21 KB conda-forge 2025-05-07T19:46:09.7896832Z cuda-visual-tools-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:09.7897330Z cxx-compiler-1.5.2 | hf52228f_0 6 KB conda-forge 2025-05-07T19:46:09.7897758Z dbus-1.13.6 | h5008d03_3 604 KB conda-forge 2025-05-07T19:46:09.7898172Z expat-2.7.0 | h5888daf_0 137 KB conda-forge 2025-05-07T19:46:09.7898585Z gcc-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:46:09.7898999Z gds-tools-1.13.0.11 | h5888daf_0 37.9 MB conda-forge 2025-05-07T19:46:09.7899437Z gmp-6.3.0 | hac33072_2 449 KB conda-forge 2025-05-07T19:46:09.7899821Z gxx-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:46:09.7900238Z libcap-2.75 | h39aace5_0 118 KB conda-forge 2025-05-07T19:46:09.7900672Z libcublas-12.8.3.14 | h9ab20c4_0 460.2 MB conda-forge 2025-05-07T19:46:09.7901150Z libcublas-dev-12.8.3.14 | h9ab20c4_0 89 KB conda-forge 2025-05-07T19:46:09.7901616Z libcufft-11.3.3.41 | hbd13f7d_0 147.4 MB conda-forge 2025-05-07T19:46:09.7902072Z libcufft-dev-11.3.3.41 | h5888daf_0 33 KB conda-forge 2025-05-07T19:46:09.7902547Z libcufile-1.13.0.11 | h12f29b5_0 939 KB conda-forge 2025-05-07T19:46:09.7903021Z libcufile-dev-1.13.0.11 | h5888daf_0 35 KB conda-forge 2025-05-07T19:46:09.7903485Z libcurand-10.3.9.55 | hbd13f7d_0 43.6 MB conda-forge 2025-05-07T19:46:09.7903962Z libcurand-dev-10.3.9.55 | h5888daf_0 265 KB conda-forge 2025-05-07T19:46:09.7904433Z libcusolver-11.7.2.55 | h9ab20c4_0 156.9 MB conda-forge 2025-05-07T19:46:09.7904927Z libcusolver-dev-11.7.2.55 | h9ab20c4_0 59 KB conda-forge 2025-05-07T19:46:09.7905423Z libcusparse-12.5.7.53 | hbd13f7d_0 164.9 MB conda-forge 2025-05-07T19:46:09.7906099Z libcusparse-dev-12.5.7.53 | h5888daf_0 51 KB conda-forge 2025-05-07T19:46:09.7906577Z libgcrypt-lib-1.11.0 | hb9d3cd8_2 572 KB conda-forge 2025-05-07T19:46:09.7907004Z libglvnd-1.7.0 | ha4b6fd6_2 129 KB conda-forge 2025-05-07T19:46:09.7907447Z libgpg-error-1.55 | h3f2d84a_0 305 KB conda-forge 2025-05-07T19:46:09.7907858Z libnl-3.11.0 | hb9d3cd8_0 724 KB conda-forge 2025-05-07T19:46:09.7908278Z libnpp-12.3.3.65 | hbd13f7d_0 130.6 MB conda-forge 2025-05-07T19:46:09.7908719Z libnpp-dev-12.3.3.65 | h5888daf_0 443 KB conda-forge 2025-05-07T19:46:09.7909136Z libnuma-2.0.18 | h4ab18f5_2 42 KB conda-forge 2025-05-07T19:46:09.7909572Z libnvfatbin-12.8.55 | hbd13f7d_0 793 KB conda-forge 2025-05-07T19:46:09.7910030Z libnvfatbin-dev-12.8.55 | h5888daf_0 26 KB conda-forge 2025-05-07T19:46:09.7910503Z libnvjitlink-12.8.61 | hbd13f7d_0 28.7 MB conda-forge 2025-05-07T19:46:09.7910958Z libnvjitlink-dev-12.8.61 | h5888daf_0 25 KB conda-forge 2025-05-07T19:46:09.7911423Z libnvjpeg-12.3.5.57 | h97fd463_0 3.0 MB conda-forge 2025-05-07T19:46:09.7911876Z libnvjpeg-dev-12.3.5.57 | ha770c72_0 31 KB conda-forge 2025-05-07T19:46:09.7912308Z libopengl-1.7.0 | ha4b6fd6_2 50 KB conda-forge 2025-05-07T19:46:09.7912818Z libsystemd0-257.4 | h4e0b6ca_1 477 KB conda-forge 2025-05-07T19:46:09.7913241Z libudev1-257.4 | hbe16f8c_1 141 KB conda-forge 2025-05-07T19:46:09.7913679Z libxkbcommon-1.7.0 | h2c5496b_1 579 KB conda-forge 2025-05-07T19:46:09.7914119Z libxkbfile-1.1.0 | h166bdaf_1 111 KB conda-forge 2025-05-07T19:46:09.7914518Z lz4-c-1.10.0 | h5888daf_1 163 KB conda-forge 2025-05-07T19:46:09.7914965Z nsight-compute-2025.1.0.14 | hb5ebaad_0 320.6 MB conda-forge 2025-05-07T19:46:09.7915394Z nspr-4.36 | h5888daf_0 225 KB conda-forge 2025-05-07T19:46:09.7915784Z nss-3.111 | h159eef7_0 1.9 MB conda-forge 2025-05-07T19:46:09.7916163Z ocl-icd-2.3.3 | hb9d3cd8_0 104 KB conda-forge 2025-05-07T19:46:09.7916614Z opencl-headers-2024.10.24 | h5888daf_0 53 KB conda-forge 2025-05-07T19:46:09.7917079Z rdma-core-57.0 | h5888daf_0 1.2 MB conda-forge 2025-05-07T19:46:09.7917486Z wayland-1.23.1 | h3e06ad9_0 314 KB conda-forge 2025-05-07T19:46:09.7917913Z xcb-util-0.4.1 | hb711507_2 19 KB conda-forge 2025-05-07T19:46:09.7918345Z xcb-util-cursor-0.1.5 | hb9d3cd8_0 20 KB conda-forge 2025-05-07T19:46:09.7918808Z xcb-util-image-0.4.0 | hb711507_2 24 KB conda-forge 2025-05-07T19:46:09.7919260Z xcb-util-keysyms-0.4.1 | hb711507_0 14 KB conda-forge 2025-05-07T19:46:09.7919743Z xcb-util-renderutil-0.3.10 | hb711507_0 17 KB conda-forge 2025-05-07T19:46:09.7920206Z xcb-util-wm-0.4.2 | hb711507_0 50 KB conda-forge 2025-05-07T19:46:09.7920645Z xkeyboard-config-2.44 | hb9d3cd8_0 384 KB conda-forge 2025-05-07T19:46:09.7921138Z xorg-libxcomposite-0.4.6 | hb9d3cd8_2 13 KB conda-forge 2025-05-07T19:46:09.7921609Z xorg-libxdamage-1.1.6 | hb9d3cd8_0 13 KB conda-forge 2025-05-07T19:46:09.7922034Z ------------------------------------------------------------ 2025-05-07T19:46:09.7922507Z Total: 1.86 GB 2025-05-07T19:46:09.7922984Z 2025-05-07T19:46:09.7923123Z The following NEW packages will be INSTALLED: 2025-05-07T19:46:09.7923362Z 2025-05-07T19:46:09.7923580Z attr conda-forge/linux-64::attr-2.5.1-h166bdaf_1 2025-05-07T19:46:09.7924029Z binutils conda-forge/linux-64::binutils-2.40-h4852527_7 2025-05-07T19:46:09.7924534Z c-compiler conda-forge/linux-64::c-compiler-1.5.2-h0b41bf4_0 2025-05-07T19:46:09.7925011Z cuda conda-forge/noarch::cuda-12.8.0-ha804496_0 2025-05-07T19:46:09.7925508Z cuda-cccl_linux-64 conda-forge/noarch::cuda-cccl_linux-64-12.8.55-ha770c72_1 2025-05-07T19:46:09.7926168Z cuda-command-line~ conda-forge/linux-64::cuda-command-line-tools-12.8.0-ha770c72_0 2025-05-07T19:46:09.7926785Z cuda-compiler conda-forge/noarch::cuda-compiler-12.8.0-hbad6d8a_0 2025-05-07T19:46:09.7927383Z cuda-crt-dev_linu~ conda-forge/noarch::cuda-crt-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:46:09.7927989Z cuda-crt-tools conda-forge/linux-64::cuda-crt-tools-12.8.61-ha770c72_1 2025-05-07T19:46:09.7928535Z cuda-cudart conda-forge/linux-64::cuda-cudart-12.8.57-h5888daf_1 2025-05-07T19:46:09.7929103Z cuda-cudart-dev conda-forge/linux-64::cuda-cudart-dev-12.8.57-h5888daf_1 2025-05-07T19:46:09.7929725Z cuda-cudart-dev_l~ conda-forge/noarch::cuda-cudart-dev_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:09.7930387Z cuda-cudart-static conda-forge/linux-64::cuda-cudart-static-12.8.57-h5888daf_1 2025-05-07T19:46:09.7931066Z cuda-cudart-stati~ conda-forge/noarch::cuda-cudart-static_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:09.7931805Z cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:09.7932423Z cuda-cuobjdump conda-forge/linux-64::cuda-cuobjdump-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7932974Z cuda-cupti conda-forge/linux-64::cuda-cupti-12.8.57-hbd13f7d_0 2025-05-07T19:46:09.7933519Z cuda-cupti-dev conda-forge/linux-64::cuda-cupti-dev-12.8.57-h5888daf_0 2025-05-07T19:46:09.7934093Z cuda-cuxxfilt conda-forge/linux-64::cuda-cuxxfilt-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7934666Z cuda-driver-dev conda-forge/linux-64::cuda-driver-dev-12.8.57-h5888daf_1 2025-05-07T19:46:09.7935405Z cuda-driver-dev_l~ conda-forge/noarch::cuda-driver-dev_linux-64-12.8.90-h3f2d84a_1 2025-05-07T19:46:09.7935928Z cuda-gdb conda-forge/linux-64::cuda-gdb-12.8.55-h50b4baa_0 2025-05-07T19:46:09.7936425Z cuda-libraries conda-forge/linux-64::cuda-libraries-12.8.0-ha770c72_0 2025-05-07T19:46:09.7936991Z cuda-libraries-dev conda-forge/linux-64::cuda-libraries-dev-12.8.0-ha770c72_0 2025-05-07T19:46:09.7937529Z cuda-nsight conda-forge/linux-64::cuda-nsight-12.8.55-h7938cbb_0 2025-05-07T19:46:09.7938026Z cuda-nvcc conda-forge/linux-64::cuda-nvcc-12.8.61-hcdd1206_0 2025-05-07T19:46:09.7938538Z cuda-nvcc-dev_lin~ conda-forge/noarch::cuda-nvcc-dev_linux-64-12.8.61-he91c749_1 2025-05-07T19:46:09.7939105Z cuda-nvcc-impl conda-forge/linux-64::cuda-nvcc-impl-12.8.61-h85509e4_1 2025-05-07T19:46:09.7939651Z cuda-nvcc-tools conda-forge/linux-64::cuda-nvcc-tools-12.8.61-he02047a_1 2025-05-07T19:46:09.7940190Z cuda-nvcc_linux-64 conda-forge/linux-64::cuda-nvcc_linux-64-12.8.61-h04802cd_0 2025-05-07T19:46:09.7940720Z cuda-nvdisasm conda-forge/linux-64::cuda-nvdisasm-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7941306Z cuda-nvml-dev conda-forge/linux-64::cuda-nvml-dev-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7942080Z cuda-nvprof conda-forge/linux-64::cuda-nvprof-12.8.57-hbd13f7d_0 2025-05-07T19:46:09.7942710Z cuda-nvprune conda-forge/linux-64::cuda-nvprune-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7943226Z cuda-nvrtc conda-forge/linux-64::cuda-nvrtc-12.8.61-hbd13f7d_0 2025-05-07T19:46:09.7943769Z cuda-nvrtc-dev conda-forge/linux-64::cuda-nvrtc-dev-12.8.61-h5888daf_0 2025-05-07T19:46:09.7944285Z cuda-nvtx conda-forge/linux-64::cuda-nvtx-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7944921Z cuda-nvvm-dev_lin~ conda-forge/noarch::cuda-nvvm-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:46:09.7945521Z cuda-nvvm-impl conda-forge/linux-64::cuda-nvvm-impl-12.8.61-he02047a_1 2025-05-07T19:46:09.7946078Z cuda-nvvm-tools conda-forge/linux-64::cuda-nvvm-tools-12.8.61-he02047a_1 2025-05-07T19:46:09.7946617Z cuda-nvvp conda-forge/linux-64::cuda-nvvp-12.8.57-hbd13f7d_0 2025-05-07T19:46:09.7947108Z cuda-opencl conda-forge/linux-64::cuda-opencl-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7947663Z cuda-opencl-dev conda-forge/linux-64::cuda-opencl-dev-12.8.55-h5888daf_0 2025-05-07T19:46:09.7948275Z cuda-profiler-api conda-forge/linux-64::cuda-profiler-api-12.8.55-h7938cbb_0 2025-05-07T19:46:09.7948832Z cuda-runtime conda-forge/noarch::cuda-runtime-12.8.0-ha804496_0 2025-05-07T19:46:09.7949411Z cuda-sanitizer-api conda-forge/linux-64::cuda-sanitizer-api-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7949979Z cuda-toolkit conda-forge/noarch::cuda-toolkit-12.8.0-ha804496_0 2025-05-07T19:46:09.7950489Z cuda-tools conda-forge/linux-64::cuda-tools-12.8.0-ha770c72_0 2025-05-07T19:46:09.7950995Z cuda-version conda-forge/noarch::cuda-version-12.8-h5d125a7_3 2025-05-07T19:46:09.7951542Z cuda-visual-tools conda-forge/linux-64::cuda-visual-tools-12.8.0-ha770c72_0 2025-05-07T19:46:09.7952118Z cxx-compiler conda-forge/linux-64::cxx-compiler-1.5.2-hf52228f_0 2025-05-07T19:46:09.7952578Z dbus conda-forge/linux-64::dbus-1.13.6-h5008d03_3 2025-05-07T19:46:09.7953186Z expat conda-forge/linux-64::expat-2.7.0-h5888daf_0 2025-05-07T19:46:09.7953679Z gcc conda-forge/linux-64::gcc-11.4.0-h602e360_13 2025-05-07T19:46:09.7954127Z gds-tools conda-forge/linux-64::gds-tools-1.13.0.11-h5888daf_0 2025-05-07T19:46:09.7954590Z gmp conda-forge/linux-64::gmp-6.3.0-hac33072_2 2025-05-07T19:46:09.7954985Z gxx conda-forge/linux-64::gxx-11.4.0-h602e360_13 2025-05-07T19:46:09.7955418Z libcap conda-forge/linux-64::libcap-2.75-h39aace5_0 2025-05-07T19:46:09.7955896Z libcublas conda-forge/linux-64::libcublas-12.8.3.14-h9ab20c4_0 2025-05-07T19:46:09.7956456Z libcublas-dev conda-forge/linux-64::libcublas-dev-12.8.3.14-h9ab20c4_0 2025-05-07T19:46:09.7957007Z libcufft conda-forge/linux-64::libcufft-11.3.3.41-hbd13f7d_0 2025-05-07T19:46:09.7957528Z libcufft-dev conda-forge/linux-64::libcufft-dev-11.3.3.41-h5888daf_0 2025-05-07T19:46:09.7958074Z libcufile conda-forge/linux-64::libcufile-1.13.0.11-h12f29b5_0 2025-05-07T19:46:09.7958611Z libcufile-dev conda-forge/linux-64::libcufile-dev-1.13.0.11-h5888daf_0 2025-05-07T19:46:09.7959164Z libcurand conda-forge/linux-64::libcurand-10.3.9.55-hbd13f7d_0 2025-05-07T19:46:09.7959710Z libcurand-dev conda-forge/linux-64::libcurand-dev-10.3.9.55-h5888daf_0 2025-05-07T19:46:09.7960264Z libcusolver conda-forge/linux-64::libcusolver-11.7.2.55-h9ab20c4_0 2025-05-07T19:46:09.7960864Z libcusolver-dev conda-forge/linux-64::libcusolver-dev-11.7.2.55-h9ab20c4_0 2025-05-07T19:46:09.7961437Z libcusparse conda-forge/linux-64::libcusparse-12.5.7.53-hbd13f7d_0 2025-05-07T19:46:09.7962029Z libcusparse-dev conda-forge/linux-64::libcusparse-dev-12.5.7.53-h5888daf_0 2025-05-07T19:46:09.7962724Z libgcrypt-lib conda-forge/linux-64::libgcrypt-lib-1.11.0-hb9d3cd8_2 2025-05-07T19:46:09.7963238Z libglvnd conda-forge/linux-64::libglvnd-1.7.0-ha4b6fd6_2 2025-05-07T19:46:09.7963751Z libgpg-error conda-forge/linux-64::libgpg-error-1.55-h3f2d84a_0 2025-05-07T19:46:09.7964232Z libnl conda-forge/linux-64::libnl-3.11.0-hb9d3cd8_0 2025-05-07T19:46:09.7964694Z libnpp conda-forge/linux-64::libnpp-12.3.3.65-hbd13f7d_0 2025-05-07T19:46:09.7965203Z libnpp-dev conda-forge/linux-64::libnpp-dev-12.3.3.65-h5888daf_0 2025-05-07T19:46:09.7965691Z libnuma conda-forge/linux-64::libnuma-2.0.18-h4ab18f5_2 2025-05-07T19:46:09.7966270Z libnvfatbin conda-forge/linux-64::libnvfatbin-12.8.55-hbd13f7d_0 2025-05-07T19:46:09.7966827Z libnvfatbin-dev conda-forge/linux-64::libnvfatbin-dev-12.8.55-h5888daf_0 2025-05-07T19:46:09.7967606Z libnvjitlink conda-forge/linux-64::libnvjitlink-12.8.61-hbd13f7d_0 2025-05-07T19:46:09.7968203Z libnvjitlink-dev conda-forge/linux-64::libnvjitlink-dev-12.8.61-h5888daf_0 2025-05-07T19:46:09.7968761Z libnvjpeg conda-forge/linux-64::libnvjpeg-12.3.5.57-h97fd463_0 2025-05-07T19:46:09.7969325Z libnvjpeg-dev conda-forge/linux-64::libnvjpeg-dev-12.3.5.57-ha770c72_0 2025-05-07T19:46:09.7969932Z libopengl conda-forge/linux-64::libopengl-1.7.0-ha4b6fd6_2 2025-05-07T19:46:09.7970454Z libsystemd0 conda-forge/linux-64::libsystemd0-257.4-h4e0b6ca_1 2025-05-07T19:46:09.7970968Z libudev1 conda-forge/linux-64::libudev1-257.4-hbe16f8c_1 2025-05-07T19:46:09.7971473Z libxkbcommon conda-forge/linux-64::libxkbcommon-1.7.0-h2c5496b_1 2025-05-07T19:46:09.7972014Z libxkbfile conda-forge/linux-64::libxkbfile-1.1.0-h166bdaf_1 2025-05-07T19:46:09.7972470Z lz4-c conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1 2025-05-07T19:46:09.7972998Z nsight-compute conda-forge/linux-64::nsight-compute-2025.1.0.14-hb5ebaad_0 2025-05-07T19:46:09.7973526Z nspr conda-forge/linux-64::nspr-4.36-h5888daf_0 2025-05-07T19:46:09.7973924Z nss conda-forge/linux-64::nss-3.111-h159eef7_0 2025-05-07T19:46:09.7974358Z ocl-icd conda-forge/linux-64::ocl-icd-2.3.3-hb9d3cd8_0 2025-05-07T19:46:09.7975024Z opencl-headers conda-forge/linux-64::opencl-headers-2024.10.24-h5888daf_0 2025-05-07T19:46:09.7975589Z rdma-core conda-forge/linux-64::rdma-core-57.0-h5888daf_0 2025-05-07T19:46:09.7976078Z wayland conda-forge/linux-64::wayland-1.23.1-h3e06ad9_0 2025-05-07T19:46:09.7976541Z xcb-util conda-forge/linux-64::xcb-util-0.4.1-hb711507_2 2025-05-07T19:46:09.7977080Z xcb-util-cursor conda-forge/linux-64::xcb-util-cursor-0.1.5-hb9d3cd8_0 2025-05-07T19:46:09.7977641Z xcb-util-image conda-forge/linux-64::xcb-util-image-0.4.0-hb711507_2 2025-05-07T19:46:09.7978227Z xcb-util-keysyms conda-forge/linux-64::xcb-util-keysyms-0.4.1-hb711507_0 2025-05-07T19:46:09.7978848Z xcb-util-renderut~ conda-forge/linux-64::xcb-util-renderutil-0.3.10-hb711507_0 2025-05-07T19:46:09.7979408Z xcb-util-wm conda-forge/linux-64::xcb-util-wm-0.4.2-hb711507_0 2025-05-07T19:46:09.7979960Z xkeyboard-config conda-forge/linux-64::xkeyboard-config-2.44-hb9d3cd8_0 2025-05-07T19:46:09.7980569Z xorg-libxcomposite conda-forge/linux-64::xorg-libxcomposite-0.4.6-hb9d3cd8_2 2025-05-07T19:46:09.7981195Z xorg-libxdamage conda-forge/linux-64::xorg-libxdamage-1.1.6-hb9d3cd8_0 2025-05-07T19:46:09.7981538Z 2025-05-07T19:46:09.7981583Z 2025-05-07T19:46:09.7981604Z 2025-05-07T19:46:09.7981757Z Downloading and Extracting Packages: ...working... 2025-05-07T19:46:09.7982149Z libcublas-12.8.3.14 | 460.2 MB | | 0% 2025-05-07T19:46:09.7982415Z 2025-05-07T19:46:09.7982749Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:46:09.7983010Z 2025-05-07T19:46:09.7983014Z 2025-05-07T19:46:09.7983267Z libcusparse-12.5.7.5 | 164.9 MB | | 0%  2025-05-07T19:46:09.7983538Z 2025-05-07T19:46:09.7983542Z 2025-05-07T19:46:09.7983546Z 2025-05-07T19:46:09.7983786Z libcusolver-11.7.2.5 | 156.9 MB | | 0%  2025-05-07T19:46:09.7984080Z 2025-05-07T19:46:09.7984083Z 2025-05-07T19:46:09.7984087Z 2025-05-07T19:46:09.7984090Z 2025-05-07T19:46:09.7990212Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:46:09.7991034Z 2025-05-07T19:46:09.7991070Z 2025-05-07T19:46:09.7991080Z 2025-05-07T19:46:09.7991090Z 2025-05-07T19:46:09.7991101Z 2025-05-07T19:46:09.7991790Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:46:09.7992856Z 2025-05-07T19:46:09.7992867Z 2025-05-07T19:46:09.7992878Z 2025-05-07T19:46:09.7992888Z 2025-05-07T19:46:09.7992898Z 2025-05-07T19:46:09.7992909Z 2025-05-07T19:46:09.7993683Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:46:09.7994540Z 2025-05-07T19:46:09.7994551Z 2025-05-07T19:46:09.7994561Z 2025-05-07T19:46:09.7994571Z 2025-05-07T19:46:09.7994582Z 2025-05-07T19:46:09.7994592Z 2025-05-07T19:46:09.7994602Z 2025-05-07T19:46:09.7995356Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:46:09.7996187Z 2025-05-07T19:46:09.7996198Z 2025-05-07T19:46:09.7996208Z 2025-05-07T19:46:09.7996219Z 2025-05-07T19:46:09.7996242Z 2025-05-07T19:46:09.7996254Z 2025-05-07T19:46:09.7996264Z 2025-05-07T19:46:09.7996274Z 2025-05-07T19:46:09.7997075Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:46:09.7998379Z 2025-05-07T19:46:09.7998398Z 2025-05-07T19:46:09.7998408Z 2025-05-07T19:46:09.7998434Z 2025-05-07T19:46:09.7998445Z 2025-05-07T19:46:09.7998455Z 2025-05-07T19:46:09.7998465Z 2025-05-07T19:46:09.7998475Z 2025-05-07T19:46:09.7998486Z 2025-05-07T19:46:09.7998972Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:46:09.7999270Z 2025-05-07T19:46:09.7999273Z 2025-05-07T19:46:09.7999277Z 2025-05-07T19:46:09.7999280Z 2025-05-07T19:46:09.7999283Z 2025-05-07T19:46:09.7999287Z 2025-05-07T19:46:09.7999290Z 2025-05-07T19:46:09.7999294Z 2025-05-07T19:46:09.7999297Z 2025-05-07T19:46:09.7999301Z 2025-05-07T19:46:09.7999581Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:09.7999967Z 2025-05-07T19:46:09.7999972Z 2025-05-07T19:46:09.7999975Z 2025-05-07T19:46:09.7999979Z 2025-05-07T19:46:09.7999982Z 2025-05-07T19:46:09.7999986Z 2025-05-07T19:46:09.7999989Z 2025-05-07T19:46:09.7999993Z 2025-05-07T19:46:09.7999996Z 2025-05-07T19:46:09.8000000Z 2025-05-07T19:46:09.8000008Z 2025-05-07T19:46:09.8000305Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:09.8000618Z 2025-05-07T19:46:09.8000622Z 2025-05-07T19:46:09.8000625Z 2025-05-07T19:46:09.8000628Z 2025-05-07T19:46:09.8000632Z 2025-05-07T19:46:09.8000635Z 2025-05-07T19:46:09.8000639Z 2025-05-07T19:46:09.8000642Z 2025-05-07T19:46:09.8000646Z 2025-05-07T19:46:09.8000649Z 2025-05-07T19:46:09.8000653Z 2025-05-07T19:46:09.8000667Z 2025-05-07T19:46:09.8000947Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:09.8001267Z 2025-05-07T19:46:09.8001270Z 2025-05-07T19:46:09.8001274Z 2025-05-07T19:46:09.8001281Z 2025-05-07T19:46:09.8001284Z 2025-05-07T19:46:09.8001288Z 2025-05-07T19:46:09.8001292Z 2025-05-07T19:46:09.8001295Z 2025-05-07T19:46:09.8001298Z 2025-05-07T19:46:09.8001316Z 2025-05-07T19:46:09.8001320Z 2025-05-07T19:46:09.8001323Z 2025-05-07T19:46:09.8001326Z 2025-05-07T19:46:09.8001652Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:09.8001983Z 2025-05-07T19:46:09.8001987Z 2025-05-07T19:46:09.8001990Z 2025-05-07T19:46:09.8001994Z 2025-05-07T19:46:09.8001997Z 2025-05-07T19:46:09.8002001Z 2025-05-07T19:46:09.8002004Z 2025-05-07T19:46:09.8002008Z 2025-05-07T19:46:09.8002011Z 2025-05-07T19:46:09.8002015Z 2025-05-07T19:46:09.8002018Z 2025-05-07T19:46:09.8002022Z 2025-05-07T19:46:09.8002025Z 2025-05-07T19:46:09.8002028Z 2025-05-07T19:46:09.8002316Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:09.8002734Z 2025-05-07T19:46:09.8002738Z 2025-05-07T19:46:09.8002745Z 2025-05-07T19:46:09.8002749Z 2025-05-07T19:46:09.8002753Z 2025-05-07T19:46:09.8002756Z 2025-05-07T19:46:09.8002760Z 2025-05-07T19:46:09.8002763Z 2025-05-07T19:46:09.8002766Z 2025-05-07T19:46:09.8002770Z 2025-05-07T19:46:09.8002773Z 2025-05-07T19:46:09.8002777Z 2025-05-07T19:46:09.8002846Z 2025-05-07T19:46:09.8002850Z 2025-05-07T19:46:09.8002853Z 2025-05-07T19:46:09.8003168Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:09.8003493Z 2025-05-07T19:46:09.8003497Z 2025-05-07T19:46:09.8003500Z 2025-05-07T19:46:09.8003503Z 2025-05-07T19:46:09.8003507Z 2025-05-07T19:46:09.8003511Z 2025-05-07T19:46:09.8003514Z 2025-05-07T19:46:09.8003517Z 2025-05-07T19:46:09.8003521Z 2025-05-07T19:46:09.8003525Z 2025-05-07T19:46:09.8003528Z 2025-05-07T19:46:09.8003531Z 2025-05-07T19:46:09.8003535Z 2025-05-07T19:46:09.8003551Z 2025-05-07T19:46:09.8003554Z 2025-05-07T19:46:09.8003557Z 2025-05-07T19:46:09.8003876Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:09.8004215Z 2025-05-07T19:46:09.8004219Z 2025-05-07T19:46:09.8004222Z 2025-05-07T19:46:09.8004226Z 2025-05-07T19:46:09.8004229Z 2025-05-07T19:46:09.8004233Z 2025-05-07T19:46:09.8004249Z 2025-05-07T19:46:09.8004257Z 2025-05-07T19:46:09.8004260Z 2025-05-07T19:46:09.8004263Z 2025-05-07T19:46:09.8004267Z 2025-05-07T19:46:09.8004270Z 2025-05-07T19:46:09.8004274Z 2025-05-07T19:46:09.8004277Z 2025-05-07T19:46:09.8004281Z 2025-05-07T19:46:09.8004284Z 2025-05-07T19:46:09.8004287Z 2025-05-07T19:46:09.8004598Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:09.8004936Z 2025-05-07T19:46:09.8004940Z 2025-05-07T19:46:09.8004943Z 2025-05-07T19:46:09.8004947Z 2025-05-07T19:46:09.8004950Z 2025-05-07T19:46:09.8004953Z 2025-05-07T19:46:09.8004957Z 2025-05-07T19:46:09.8004960Z 2025-05-07T19:46:09.8004964Z 2025-05-07T19:46:09.8005047Z 2025-05-07T19:46:09.8005051Z 2025-05-07T19:46:09.8005055Z 2025-05-07T19:46:09.8005058Z 2025-05-07T19:46:09.8005061Z 2025-05-07T19:46:09.8005065Z 2025-05-07T19:46:09.8005068Z 2025-05-07T19:46:09.8005071Z 2025-05-07T19:46:09.8005075Z 2025-05-07T19:46:09.8005412Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:09.8005743Z 2025-05-07T19:46:09.8005747Z 2025-05-07T19:46:09.8005750Z 2025-05-07T19:46:09.8005753Z 2025-05-07T19:46:09.8005757Z 2025-05-07T19:46:09.8005760Z 2025-05-07T19:46:09.8005764Z 2025-05-07T19:46:09.8005768Z 2025-05-07T19:46:09.8005771Z 2025-05-07T19:46:09.8005775Z 2025-05-07T19:46:09.8005778Z 2025-05-07T19:46:09.8005781Z 2025-05-07T19:46:09.8005793Z 2025-05-07T19:46:09.8005796Z 2025-05-07T19:46:09.8005800Z 2025-05-07T19:46:09.8005803Z 2025-05-07T19:46:09.8005806Z 2025-05-07T19:46:09.8005810Z 2025-05-07T19:46:09.8005813Z 2025-05-07T19:46:09.8950341Z ... (more hidden) ... 2025-05-07T19:46:09.9003959Z libcublas-12.8.3.14 | 460.2 MB | | 1% 2025-05-07T19:46:09.9005282Z 2025-05-07T19:46:09.9005303Z 2025-05-07T19:46:09.9005322Z 2025-05-07T19:46:09.9036627Z libcusolver-11.7.2.5 | 156.9 MB | 3 | 3%  2025-05-07T19:46:09.9037141Z 2025-05-07T19:46:09.9037146Z 2025-05-07T19:46:09.9084094Z libcusparse-12.5.7.5 | 164.9 MB | 1 | 2%  2025-05-07T19:46:09.9085194Z 2025-05-07T19:46:09.9153781Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:46:09.9154221Z 2025-05-07T19:46:09.9154228Z 2025-05-07T19:46:09.9154234Z 2025-05-07T19:46:09.9154240Z 2025-05-07T19:46:09.9950521Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:46:10.0004661Z libcublas-12.8.3.14 | 460.2 MB | 2 | 2% 2025-05-07T19:46:10.0004945Z 2025-05-07T19:46:10.0004951Z 2025-05-07T19:46:10.0004964Z 2025-05-07T19:46:10.0038634Z libcusolver-11.7.2.5 | 156.9 MB | 7 | 7%  2025-05-07T19:46:10.0038958Z 2025-05-07T19:46:10.0039054Z 2025-05-07T19:46:10.0084354Z libcusparse-12.5.7.5 | 164.9 MB | 4 | 5%  2025-05-07T19:46:10.0084648Z 2025-05-07T19:46:10.0156867Z nsight-compute-2025. | 320.6 MB | 2 | 2%  2025-05-07T19:46:10.0158090Z 2025-05-07T19:46:10.0158120Z 2025-05-07T19:46:10.0158130Z 2025-05-07T19:46:10.0158141Z 2025-05-07T19:46:10.1007457Z libcufft-11.3.3.41 | 147.4 MB | 7 | 7%  2025-05-07T19:46:10.1008619Z 2025-05-07T19:46:10.1008627Z 2025-05-07T19:46:10.1008633Z 2025-05-07T19:46:10.1040019Z libcusolver-11.7.2.5 | 156.9 MB | #1 | 11%  2025-05-07T19:46:10.1040321Z 2025-05-07T19:46:10.1040325Z 2025-05-07T19:46:10.1047095Z libcusparse-12.5.7.5 | 164.9 MB | 7 | 8%  2025-05-07T19:46:10.1085273Z libcublas-12.8.3.14 | 460.2 MB | 3 | 3% 2025-05-07T19:46:10.1086076Z 2025-05-07T19:46:10.1157256Z nsight-compute-2025. | 320.6 MB | 4 | 4%  2025-05-07T19:46:10.1157536Z 2025-05-07T19:46:10.1157541Z 2025-05-07T19:46:10.1157556Z 2025-05-07T19:46:10.1157560Z 2025-05-07T19:46:10.2010044Z libcufft-11.3.3.41 | 147.4 MB | #1 | 11%  2025-05-07T19:46:10.2010904Z 2025-05-07T19:46:10.2010918Z 2025-05-07T19:46:10.2010968Z 2025-05-07T19:46:10.2041421Z libcusolver-11.7.2.5 | 156.9 MB | #5 | 15%  2025-05-07T19:46:10.2042302Z 2025-05-07T19:46:10.2042316Z 2025-05-07T19:46:10.2048164Z libcusparse-12.5.7.5 | 164.9 MB | # | 11%  2025-05-07T19:46:10.2168672Z libcublas-12.8.3.14 | 460.2 MB | 4 | 4% 2025-05-07T19:46:10.2168979Z 2025-05-07T19:46:10.2168985Z 2025-05-07T19:46:10.2168991Z 2025-05-07T19:46:10.2168996Z 2025-05-07T19:46:10.2169310Z libcufft-11.3.3.41 | 147.4 MB | #5 | 16%  2025-05-07T19:46:10.2169920Z 2025-05-07T19:46:10.3009301Z nsight-compute-2025. | 320.6 MB | 5 | 6%  2025-05-07T19:46:10.3009608Z 2025-05-07T19:46:10.3009853Z 2025-05-07T19:46:10.3009858Z 2025-05-07T19:46:10.3044083Z libcusolver-11.7.2.5 | 156.9 MB | #9 | 19%  2025-05-07T19:46:10.3044384Z 2025-05-07T19:46:10.3044389Z 2025-05-07T19:46:10.3047344Z libcusparse-12.5.7.5 | 164.9 MB | #3 | 13%  2025-05-07T19:46:10.3172051Z libcublas-12.8.3.14 | 460.2 MB | 5 | 6% 2025-05-07T19:46:10.3172371Z 2025-05-07T19:46:10.3172376Z 2025-05-07T19:46:10.3172380Z 2025-05-07T19:46:10.3172384Z 2025-05-07T19:46:10.3188180Z libcufft-11.3.3.41 | 147.4 MB | #9 | 20%  2025-05-07T19:46:10.3188577Z 2025-05-07T19:46:10.4043448Z nsight-compute-2025. | 320.6 MB | 7 | 7%  2025-05-07T19:46:10.4043806Z 2025-05-07T19:46:10.4043811Z 2025-05-07T19:46:10.4043817Z 2025-05-07T19:46:10.4044115Z libcusolver-11.7.2.5 | 156.9 MB | ##3 | 23%  2025-05-07T19:46:10.4044404Z 2025-05-07T19:46:10.4044881Z 2025-05-07T19:46:10.4050323Z libcusparse-12.5.7.5 | 164.9 MB | #6 | 16%  2025-05-07T19:46:10.4189190Z libcublas-12.8.3.14 | 460.2 MB | 6 | 7% 2025-05-07T19:46:10.4189495Z 2025-05-07T19:46:10.4258701Z nsight-compute-2025. | 320.6 MB | 9 | 9%  2025-05-07T19:46:10.4258996Z 2025-05-07T19:46:10.4259146Z 2025-05-07T19:46:10.4259305Z 2025-05-07T19:46:10.4259348Z 2025-05-07T19:46:10.5045060Z libcufft-11.3.3.41 | 147.4 MB | ##4 | 24%  2025-05-07T19:46:10.5045367Z 2025-05-07T19:46:10.5050592Z 2025-05-07T19:46:10.5050846Z libcusparse-12.5.7.5 | 164.9 MB | #9 | 19%  2025-05-07T19:46:10.5078977Z libcublas-12.8.3.14 | 460.2 MB | 8 | 8% 2025-05-07T19:46:10.5079271Z 2025-05-07T19:46:10.5079277Z 2025-05-07T19:46:10.5079293Z 2025-05-07T19:46:10.5191365Z libcusolver-11.7.2.5 | 156.9 MB | ##7 | 27%  2025-05-07T19:46:10.5191682Z 2025-05-07T19:46:10.5283787Z nsight-compute-2025. | 320.6 MB | # | 11%  2025-05-07T19:46:10.5284104Z 2025-05-07T19:46:10.5284108Z 2025-05-07T19:46:10.5284128Z 2025-05-07T19:46:10.5284132Z 2025-05-07T19:46:10.6046137Z libcufft-11.3.3.41 | 147.4 MB | ##8 | 28%  2025-05-07T19:46:10.6046461Z 2025-05-07T19:46:10.6046466Z 2025-05-07T19:46:10.6052659Z libcusparse-12.5.7.5 | 164.9 MB | ##2 | 22%  2025-05-07T19:46:10.6085113Z libcublas-12.8.3.14 | 460.2 MB | 9 | 10% 2025-05-07T19:46:10.6085900Z 2025-05-07T19:46:10.6085915Z 2025-05-07T19:46:10.6085926Z 2025-05-07T19:46:10.6192123Z libcusolver-11.7.2.5 | 156.9 MB | ###1 | 31%  2025-05-07T19:46:10.6192443Z 2025-05-07T19:46:10.6285156Z nsight-compute-2025. | 320.6 MB | #2 | 13%  2025-05-07T19:46:10.6285437Z 2025-05-07T19:46:10.6285442Z 2025-05-07T19:46:10.6285458Z 2025-05-07T19:46:10.6285471Z 2025-05-07T19:46:10.7055031Z libcufft-11.3.3.41 | 147.4 MB | ###2 | 33%  2025-05-07T19:46:10.7056308Z libcublas-12.8.3.14 | 460.2 MB | # | 11% 2025-05-07T19:46:10.7057143Z 2025-05-07T19:46:10.7057193Z 2025-05-07T19:46:10.7084740Z libcusparse-12.5.7.5 | 164.9 MB | ##5 | 25%  2025-05-07T19:46:10.7085046Z 2025-05-07T19:46:10.7085051Z 2025-05-07T19:46:10.7085055Z 2025-05-07T19:46:10.7254212Z libcusolver-11.7.2.5 | 156.9 MB | ###4 | 35%  2025-05-07T19:46:10.7255452Z 2025-05-07T19:46:10.7310861Z nsight-compute-2025. | 320.6 MB | #4 | 15%  2025-05-07T19:46:10.7311185Z 2025-05-07T19:46:10.7311190Z 2025-05-07T19:46:10.7311193Z 2025-05-07T19:46:10.7311197Z 2025-05-07T19:46:10.8058316Z libcufft-11.3.3.41 | 147.4 MB | ###6 | 37%  2025-05-07T19:46:10.8059615Z libcublas-12.8.3.14 | 460.2 MB | #2 | 12% 2025-05-07T19:46:10.8060352Z 2025-05-07T19:46:10.8060450Z 2025-05-07T19:46:10.8115532Z libcusparse-12.5.7.5 | 164.9 MB | ##8 | 28%  2025-05-07T19:46:10.8115866Z 2025-05-07T19:46:10.8115870Z 2025-05-07T19:46:10.8115874Z 2025-05-07T19:46:10.8253755Z libcusolver-11.7.2.5 | 156.9 MB | ###8 | 39%  2025-05-07T19:46:10.8255086Z 2025-05-07T19:46:10.8313288Z nsight-compute-2025. | 320.6 MB | #6 | 16%  2025-05-07T19:46:10.8313591Z 2025-05-07T19:46:10.8313595Z 2025-05-07T19:46:10.8313599Z 2025-05-07T19:46:10.8313603Z 2025-05-07T19:46:10.9063800Z libcufft-11.3.3.41 | 147.4 MB | #### | 41%  2025-05-07T19:46:10.9064138Z 2025-05-07T19:46:10.9064142Z 2025-05-07T19:46:10.9066559Z libcusparse-12.5.7.5 | 164.9 MB | ###1 | 31%  2025-05-07T19:46:10.9116065Z libcublas-12.8.3.14 | 460.2 MB | #3 | 14% 2025-05-07T19:46:10.9116482Z 2025-05-07T19:46:10.9116658Z 2025-05-07T19:46:10.9116666Z 2025-05-07T19:46:10.9254990Z libcusolver-11.7.2.5 | 156.9 MB | ####2 | 43%  2025-05-07T19:46:10.9255359Z 2025-05-07T19:46:10.9315180Z nsight-compute-2025. | 320.6 MB | #8 | 18%  2025-05-07T19:46:10.9315483Z 2025-05-07T19:46:10.9315488Z 2025-05-07T19:46:10.9315491Z 2025-05-07T19:46:10.9315495Z 2025-05-07T19:46:11.0065876Z libcufft-11.3.3.41 | 147.4 MB | ####5 | 45%  2025-05-07T19:46:11.0066776Z 2025-05-07T19:46:11.0066790Z 2025-05-07T19:46:11.0069905Z libcusparse-12.5.7.5 | 164.9 MB | ###4 | 35%  2025-05-07T19:46:11.0116573Z libcublas-12.8.3.14 | 460.2 MB | #4 | 15% 2025-05-07T19:46:11.0116871Z 2025-05-07T19:46:11.0116892Z 2025-05-07T19:46:11.0116896Z 2025-05-07T19:46:11.0314807Z libcusolver-11.7.2.5 | 156.9 MB | ####6 | 47%  2025-05-07T19:46:11.0315124Z 2025-05-07T19:46:11.0315214Z 2025-05-07T19:46:11.0315222Z 2025-05-07T19:46:11.0315227Z 2025-05-07T19:46:11.0492179Z libcufft-11.3.3.41 | 147.4 MB | ####9 | 49%  2025-05-07T19:46:11.0492514Z 2025-05-07T19:46:11.1066341Z nsight-compute-2025. | 320.6 MB | ## | 20%  2025-05-07T19:46:11.1066792Z libcublas-12.8.3.14 | 460.2 MB | #6 | 17% 2025-05-07T19:46:11.1067237Z 2025-05-07T19:46:11.1067242Z 2025-05-07T19:46:11.1118986Z libcusparse-12.5.7.5 | 164.9 MB | ###8 | 38%  2025-05-07T19:46:11.1119322Z 2025-05-07T19:46:11.1119326Z 2025-05-07T19:46:11.1119330Z 2025-05-07T19:46:11.1493754Z libcusolver-11.7.2.5 | 156.9 MB | #####1 | 51%  2025-05-07T19:46:11.1494650Z 2025-05-07T19:46:11.2069537Z nsight-compute-2025. | 320.6 MB | ##2 | 22%  2025-05-07T19:46:11.2070058Z 2025-05-07T19:46:11.2070063Z 2025-05-07T19:46:11.2070368Z libcusparse-12.5.7.5 | 164.9 MB | ####2 | 42%  2025-05-07T19:46:11.2119060Z libcublas-12.8.3.14 | 460.2 MB | #7 | 18% 2025-05-07T19:46:11.2119863Z 2025-05-07T19:46:11.2119902Z 2025-05-07T19:46:11.2119913Z 2025-05-07T19:46:11.2494325Z libcusolver-11.7.2.5 | 156.9 MB | #####6 | 56%  2025-05-07T19:46:11.2495202Z 2025-05-07T19:46:11.3078885Z nsight-compute-2025. | 320.6 MB | ##4 | 25%  2025-05-07T19:46:11.3119893Z libcublas-12.8.3.14 | 460.2 MB | #9 | 19% 2025-05-07T19:46:11.3120196Z 2025-05-07T19:46:11.3120334Z 2025-05-07T19:46:11.3120342Z 2025-05-07T19:46:11.3120841Z libcusolver-11.7.2.5 | 156.9 MB | ###### | 61%  2025-05-07T19:46:11.3121143Z 2025-05-07T19:46:11.3121147Z 2025-05-07T19:46:11.3486286Z libcusparse-12.5.7.5 | 164.9 MB | ####5 | 46%  2025-05-07T19:46:11.3486580Z 2025-05-07T19:46:11.3486594Z 2025-05-07T19:46:11.3486598Z 2025-05-07T19:46:11.3486620Z 2025-05-07T19:46:11.3551848Z libcufft-11.3.3.41 | 147.4 MB | #####3 | 54%  2025-05-07T19:46:11.3552714Z 2025-05-07T19:46:11.4118100Z nsight-compute-2025. | 320.6 MB | ##6 | 27%  2025-05-07T19:46:11.4244688Z libcublas-12.8.3.14 | 460.2 MB | ## | 21% 2025-05-07T19:46:11.4244971Z 2025-05-07T19:46:11.4244975Z 2025-05-07T19:46:11.4259157Z libcusparse-12.5.7.5 | 164.9 MB | ####9 | 49%  2025-05-07T19:46:11.4259466Z 2025-05-07T19:46:11.4259471Z 2025-05-07T19:46:11.4259475Z 2025-05-07T19:46:11.4484749Z libcusolver-11.7.2.5 | 156.9 MB | ######4 | 65%  2025-05-07T19:46:11.4485082Z 2025-05-07T19:46:11.4485086Z 2025-05-07T19:46:11.4485305Z 2025-05-07T19:46:11.4485310Z 2025-05-07T19:46:11.4551953Z libcufft-11.3.3.41 | 147.4 MB | #####8 | 58%  2025-05-07T19:46:11.4552276Z 2025-05-07T19:46:11.5246500Z nsight-compute-2025. | 320.6 MB | ##8 | 29%  2025-05-07T19:46:11.5246798Z 2025-05-07T19:46:11.5246824Z 2025-05-07T19:46:11.5268997Z libcusparse-12.5.7.5 | 164.9 MB | #####2 | 53%  2025-05-07T19:46:11.5269291Z 2025-05-07T19:46:11.5352309Z 2025-05-07T19:46:11.5352316Z 2025-05-07T19:46:11.5352730Z libcusolver-11.7.2.5 | 156.9 MB | ######9 | 69%  2025-05-07T19:46:11.5484947Z libcublas-12.8.3.14 | 460.2 MB | ##2 | 22% 2025-05-07T19:46:11.5485230Z 2025-05-07T19:46:11.5485235Z 2025-05-07T19:46:11.5485238Z 2025-05-07T19:46:11.5485242Z 2025-05-07T19:46:11.6246759Z libcufft-11.3.3.41 | 147.4 MB | ######2 | 63%  2025-05-07T19:46:11.6247069Z 2025-05-07T19:46:11.6247073Z 2025-05-07T19:46:11.6270324Z libcusparse-12.5.7.5 | 164.9 MB | #####6 | 57%  2025-05-07T19:46:11.6271215Z 2025-05-07T19:46:11.6271228Z 2025-05-07T19:46:11.6271239Z 2025-05-07T19:46:11.6277221Z libcusolver-11.7.2.5 | 156.9 MB | #######4 | 74%  2025-05-07T19:46:11.6278059Z 2025-05-07T19:46:11.6487754Z nsight-compute-2025. | 320.6 MB | ### | 31%  2025-05-07T19:46:11.6488059Z 2025-05-07T19:46:11.6488063Z 2025-05-07T19:46:11.6488067Z 2025-05-07T19:46:11.6488080Z 2025-05-07T19:46:11.7196706Z libcufft-11.3.3.41 | 147.4 MB | ######7 | 68%  2025-05-07T19:46:11.7248655Z libcublas-12.8.3.14 | 460.2 MB | ##3 | 23% 2025-05-07T19:46:11.7249494Z 2025-05-07T19:46:11.7249525Z 2025-05-07T19:46:11.7271559Z libcusparse-12.5.7.5 | 164.9 MB | ###### | 60%  2025-05-07T19:46:11.7271898Z 2025-05-07T19:46:11.7271904Z 2025-05-07T19:46:11.7271915Z 2025-05-07T19:46:11.7444969Z libcusolver-11.7.2.5 | 156.9 MB | #######8 | 79%  2025-05-07T19:46:11.7445325Z 2025-05-07T19:46:11.7551600Z nsight-compute-2025. | 320.6 MB | ###2 | 33%  2025-05-07T19:46:11.7552236Z 2025-05-07T19:46:11.7552321Z 2025-05-07T19:46:11.7552350Z 2025-05-07T19:46:11.7552365Z 2025-05-07T19:46:11.8198525Z libcufft-11.3.3.41 | 147.4 MB | #######2 | 72%  2025-05-07T19:46:11.8304173Z libcublas-12.8.3.14 | 460.2 MB | ##4 | 25% 2025-05-07T19:46:11.8304706Z 2025-05-07T19:46:11.8304712Z 2025-05-07T19:46:11.8444995Z libcusparse-12.5.7.5 | 164.9 MB | ######3 | 64%  2025-05-07T19:46:11.8445305Z 2025-05-07T19:46:11.8477186Z nsight-compute-2025. | 320.6 MB | ###4 | 34%  2025-05-07T19:46:11.8477494Z 2025-05-07T19:46:11.8477498Z 2025-05-07T19:46:11.8477502Z 2025-05-07T19:46:11.8552668Z libcusolver-11.7.2.5 | 156.9 MB | ########3 | 83%  2025-05-07T19:46:11.8552981Z 2025-05-07T19:46:11.8552986Z 2025-05-07T19:46:11.8552992Z 2025-05-07T19:46:11.8552997Z 2025-05-07T19:46:11.9200527Z libcufft-11.3.3.41 | 147.4 MB | #######6 | 77%  2025-05-07T19:46:11.9435329Z libcublas-12.8.3.14 | 460.2 MB | ##6 | 26% 2025-05-07T19:46:11.9435807Z 2025-05-07T19:46:11.9435852Z 2025-05-07T19:46:11.9444585Z libcusparse-12.5.7.5 | 164.9 MB | ######7 | 67%  2025-05-07T19:46:11.9445739Z 2025-05-07T19:46:11.9572504Z nsight-compute-2025. | 320.6 MB | ###6 | 36%  2025-05-07T19:46:11.9572828Z 2025-05-07T19:46:11.9572833Z 2025-05-07T19:46:11.9572836Z 2025-05-07T19:46:11.9572840Z 2025-05-07T19:46:11.9596154Z libcufft-11.3.3.41 | 147.4 MB | ######## | 81%  2025-05-07T19:46:11.9596483Z 2025-05-07T19:46:11.9596488Z 2025-05-07T19:46:11.9596492Z 2025-05-07T19:46:12.0202223Z libcusolver-11.7.2.5 | 156.9 MB | ########7 | 88%  2025-05-07T19:46:12.0490541Z libcublas-12.8.3.14 | 460.2 MB | ##7 | 28% 2025-05-07T19:46:12.0490811Z 2025-05-07T19:46:12.0605554Z nsight-compute-2025. | 320.6 MB | ###8 | 38%  2025-05-07T19:46:12.0606306Z 2025-05-07T19:46:12.0606311Z 2025-05-07T19:46:12.0606315Z 2025-05-07T19:46:12.0621921Z libcusolver-11.7.2.5 | 156.9 MB | #########1 | 92%  2025-05-07T19:46:12.0622294Z 2025-05-07T19:46:12.0622299Z 2025-05-07T19:46:12.0645136Z libcusparse-12.5.7.5 | 164.9 MB | ####### | 71%  2025-05-07T19:46:12.0645696Z 2025-05-07T19:46:12.0645700Z 2025-05-07T19:46:12.0645719Z 2025-05-07T19:46:12.0645723Z 2025-05-07T19:46:12.1410639Z libcufft-11.3.3.41 | 147.4 MB | ########5 | 85%  2025-05-07T19:46:12.1604811Z libcublas-12.8.3.14 | 460.2 MB | ##8 | 29% 2025-05-07T19:46:12.1605595Z 2025-05-07T19:46:12.1605609Z 2025-05-07T19:46:12.1605620Z 2025-05-07T19:46:12.1622024Z libcusolver-11.7.2.5 | 156.9 MB | #########6 | 96%  2025-05-07T19:46:12.1622342Z 2025-05-07T19:46:12.1622347Z 2025-05-07T19:46:12.1644739Z libcusparse-12.5.7.5 | 164.9 MB | #######4 | 75%  2025-05-07T19:46:12.1645041Z 2025-05-07T19:46:12.1645046Z 2025-05-07T19:46:12.1645050Z 2025-05-07T19:46:12.1645054Z 2025-05-07T19:46:12.1886132Z libcufft-11.3.3.41 | 147.4 MB | ######### | 90%  2025-05-07T19:46:12.1886440Z 2025-05-07T19:46:12.2623478Z nsight-compute-2025. | 320.6 MB | ###9 | 40%  2025-05-07T19:46:12.2623789Z 2025-05-07T19:46:12.2623793Z 2025-05-07T19:46:12.2644630Z libcusparse-12.5.7.5 | 164.9 MB | #######8 | 78%  2025-05-07T19:46:12.2644945Z 2025-05-07T19:46:12.2644949Z 2025-05-07T19:46:12.2644954Z 2025-05-07T19:46:12.2644971Z 2025-05-07T19:46:12.2986983Z libcufft-11.3.3.41 | 147.4 MB | #########5 | 95%  2025-05-07T19:46:12.3067783Z libcublas-12.8.3.14 | 460.2 MB | ### | 30% 2025-05-07T19:46:12.3068085Z 2025-05-07T19:46:12.3623841Z nsight-compute-2025. | 320.6 MB | ####1 | 41%  2025-05-07T19:46:12.3624165Z 2025-05-07T19:46:12.3624169Z 2025-05-07T19:46:12.3765195Z libcusparse-12.5.7.5 | 164.9 MB | ########4 | 85%  2025-05-07T19:46:12.3765537Z 2025-05-07T19:46:12.3765544Z 2025-05-07T19:46:12.3765550Z 2025-05-07T19:46:12.3765555Z 2025-05-07T19:46:12.4124215Z libcufft-11.3.3.41 | 147.4 MB | #########9 | 100%  2025-05-07T19:46:12.4144009Z libcublas-12.8.3.14 | 460.2 MB | ###1 | 31% 2025-05-07T19:46:12.4144288Z 2025-05-07T19:46:12.4627203Z nsight-compute-2025. | 320.6 MB | ####3 | 43%  2025-05-07T19:46:12.4627518Z 2025-05-07T19:46:12.4627764Z 2025-05-07T19:46:12.5127328Z libcusparse-12.5.7.5 | 164.9 MB | #########1 | 92%  2025-05-07T19:46:12.5290318Z libcublas-12.8.3.14 | 460.2 MB | ###2 | 33% 2025-05-07T19:46:12.5290592Z 2025-05-07T19:46:12.5628229Z nsight-compute-2025. | 320.6 MB | ####5 | 45%  2025-05-07T19:46:12.5628532Z 2025-05-07T19:46:12.5628537Z 2025-05-07T19:46:12.6128248Z libcusparse-12.5.7.5 | 164.9 MB | #########9 | 99%  2025-05-07T19:46:12.6900504Z libcublas-12.8.3.14 | 460.2 MB | ###4 | 34% 2025-05-07T19:46:12.6901038Z 2025-05-07T19:46:12.7128473Z nsight-compute-2025. | 320.6 MB | ####7 | 48%  2025-05-07T19:46:12.7961825Z libcublas-12.8.3.14 | 460.2 MB | ###6 | 36% 2025-05-07T19:46:12.7962284Z 2025-05-07T19:46:12.8202824Z nsight-compute-2025. | 320.6 MB | ####9 | 50%  2025-05-07T19:46:12.8961647Z libcublas-12.8.3.14 | 460.2 MB | ###9 | 39% 2025-05-07T19:46:12.8962153Z 2025-05-07T19:46:12.9399959Z nsight-compute-2025. | 320.6 MB | #####1 | 52%  2025-05-07T19:46:12.9961947Z libcublas-12.8.3.14 | 460.2 MB | #### | 41% 2025-05-07T19:46:12.9962514Z 2025-05-07T19:46:13.0319187Z nsight-compute-2025. | 320.6 MB | #####4 | 54%  2025-05-07T19:46:13.0319491Z 2025-05-07T19:46:13.0319605Z 2025-05-07T19:46:13.0319609Z 2025-05-07T19:46:13.0621516Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:13.0621959Z 2025-05-07T19:46:13.0622012Z 2025-05-07T19:46:13.0622017Z 2025-05-07T19:46:13.0622237Z 2025-05-07T19:46:13.0624272Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:46:13.0732675Z libcublas-12.8.3.14 | 460.2 MB | ####2 | 43% 2025-05-07T19:46:13.0733180Z 2025-05-07T19:46:13.0733187Z 2025-05-07T19:46:13.0733191Z 2025-05-07T19:46:13.0733208Z 2025-05-07T19:46:13.0733219Z 2025-05-07T19:46:13.1304302Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:46:13.1304772Z 2025-05-07T19:46:13.1324890Z nsight-compute-2025. | 320.6 MB | #####6 | 56%  2025-05-07T19:46:13.1325206Z 2025-05-07T19:46:13.1325211Z 2025-05-07T19:46:13.1325215Z 2025-05-07T19:46:13.1325219Z 2025-05-07T19:46:13.1325223Z 2025-05-07T19:46:13.1325415Z 2025-05-07T19:46:13.1734735Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:46:13.1735684Z 2025-05-07T19:46:13.1735697Z 2025-05-07T19:46:13.1735708Z 2025-05-07T19:46:13.1735719Z 2025-05-07T19:46:13.1735747Z 2025-05-07T19:46:13.1884849Z libnpp-12.3.3.65 | 130.6 MB | 7 | 8%  2025-05-07T19:46:13.2305914Z libcublas-12.8.3.14 | 460.2 MB | ####4 | 44% 2025-05-07T19:46:13.2306407Z 2025-05-07T19:46:13.2326190Z nsight-compute-2025. | 320.6 MB | #####8 | 59%  2025-05-07T19:46:13.2326490Z 2025-05-07T19:46:13.2326495Z 2025-05-07T19:46:13.2326499Z 2025-05-07T19:46:13.2326502Z 2025-05-07T19:46:13.2326506Z 2025-05-07T19:46:13.2326509Z 2025-05-07T19:46:13.2735095Z cuda-nsight-12.8.55 | 113.2 MB | 6 | 7%  2025-05-07T19:46:13.2735447Z 2025-05-07T19:46:13.2735451Z 2025-05-07T19:46:13.2735455Z 2025-05-07T19:46:13.2735459Z 2025-05-07T19:46:13.2735462Z 2025-05-07T19:46:13.3151884Z libnpp-12.3.3.65 | 130.6 MB | #2 | 13%  2025-05-07T19:46:13.3306924Z libcublas-12.8.3.14 | 460.2 MB | ####5 | 46% 2025-05-07T19:46:13.3307554Z 2025-05-07T19:46:13.3325651Z nsight-compute-2025. | 320.6 MB | ###### | 61%  2025-05-07T19:46:13.3325955Z 2025-05-07T19:46:13.3325960Z 2025-05-07T19:46:13.3325964Z 2025-05-07T19:46:13.3325968Z 2025-05-07T19:46:13.3325972Z 2025-05-07T19:46:13.3325976Z 2025-05-07T19:46:13.3741190Z cuda-nsight-12.8.55 | 113.2 MB | #3 | 13%  2025-05-07T19:46:13.3742140Z 2025-05-07T19:46:13.3742154Z 2025-05-07T19:46:13.3742166Z 2025-05-07T19:46:13.3742176Z 2025-05-07T19:46:13.3742187Z 2025-05-07T19:46:13.4152602Z libnpp-12.3.3.65 | 130.6 MB | #7 | 18%  2025-05-07T19:46:13.4309087Z libcublas-12.8.3.14 | 460.2 MB | ####7 | 47% 2025-05-07T19:46:13.4309401Z 2025-05-07T19:46:13.4326268Z nsight-compute-2025. | 320.6 MB | ######2 | 63%  2025-05-07T19:46:13.4326549Z 2025-05-07T19:46:13.4326553Z 2025-05-07T19:46:13.4326560Z 2025-05-07T19:46:13.4326564Z 2025-05-07T19:46:13.4326567Z 2025-05-07T19:46:13.4326570Z 2025-05-07T19:46:13.4740141Z cuda-nsight-12.8.55 | 113.2 MB | ## | 20%  2025-05-07T19:46:13.4741077Z 2025-05-07T19:46:13.4741091Z 2025-05-07T19:46:13.4741103Z 2025-05-07T19:46:13.4741115Z 2025-05-07T19:46:13.4741125Z 2025-05-07T19:46:13.5152946Z libnpp-12.3.3.65 | 130.6 MB | ##3 | 23%  2025-05-07T19:46:13.5308337Z libcublas-12.8.3.14 | 460.2 MB | ####8 | 49% 2025-05-07T19:46:13.5308642Z 2025-05-07T19:46:13.5327509Z nsight-compute-2025. | 320.6 MB | ######5 | 65%  2025-05-07T19:46:13.5327781Z 2025-05-07T19:46:13.5327786Z 2025-05-07T19:46:13.5327790Z 2025-05-07T19:46:13.5327812Z 2025-05-07T19:46:13.5327816Z 2025-05-07T19:46:13.5328973Z 2025-05-07T19:46:13.5739704Z cuda-nsight-12.8.55 | 113.2 MB | ##6 | 26%  2025-05-07T19:46:13.5740033Z 2025-05-07T19:46:13.5740039Z 2025-05-07T19:46:13.5740043Z 2025-05-07T19:46:13.5740047Z 2025-05-07T19:46:13.5740050Z 2025-05-07T19:46:13.6153708Z libnpp-12.3.3.65 | 130.6 MB | ##8 | 29%  2025-05-07T19:46:13.6308566Z libcublas-12.8.3.14 | 460.2 MB | ##### | 50% 2025-05-07T19:46:13.6309086Z 2025-05-07T19:46:13.6395284Z nsight-compute-2025. | 320.6 MB | ######7 | 67%  2025-05-07T19:46:13.6395581Z 2025-05-07T19:46:13.6395603Z 2025-05-07T19:46:13.6395607Z 2025-05-07T19:46:13.6395835Z 2025-05-07T19:46:13.6395839Z 2025-05-07T19:46:13.6395843Z 2025-05-07T19:46:13.6740903Z cuda-nsight-12.8.55 | 113.2 MB | ###2 | 32%  2025-05-07T19:46:13.6741236Z 2025-05-07T19:46:13.6741241Z 2025-05-07T19:46:13.6741245Z 2025-05-07T19:46:13.6741249Z 2025-05-07T19:46:13.6741268Z 2025-05-07T19:46:13.7155755Z libnpp-12.3.3.65 | 130.6 MB | ###4 | 34%  2025-05-07T19:46:13.7312313Z libcublas-12.8.3.14 | 460.2 MB | #####1 | 52% 2025-05-07T19:46:13.7312846Z 2025-05-07T19:46:13.7397368Z nsight-compute-2025. | 320.6 MB | ######9 | 70%  2025-05-07T19:46:13.7397684Z 2025-05-07T19:46:13.7397689Z 2025-05-07T19:46:13.7397693Z 2025-05-07T19:46:13.7397697Z 2025-05-07T19:46:13.7397701Z 2025-05-07T19:46:13.7397704Z 2025-05-07T19:46:13.7505607Z cuda-nsight-12.8.55 | 113.2 MB | ###8 | 39%  2025-05-07T19:46:13.7506438Z 2025-05-07T19:46:13.7506443Z 2025-05-07T19:46:13.7803226Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:13.7804081Z 2025-05-07T19:46:13.7804094Z 2025-05-07T19:46:13.7804106Z 2025-05-07T19:46:13.7804117Z 2025-05-07T19:46:13.7804127Z 2025-05-07T19:46:13.7909228Z libnpp-12.3.3.65 | 130.6 MB | ###9 | 40%  2025-05-07T19:46:13.7909540Z 2025-05-07T19:46:13.7909559Z 2025-05-07T19:46:13.7909563Z 2025-05-07T19:46:13.7909567Z 2025-05-07T19:46:13.7909570Z 2025-05-07T19:46:13.7909574Z 2025-05-07T19:46:13.7909577Z 2025-05-07T19:46:13.8328350Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:46:13.8635421Z libcublas-12.8.3.14 | 460.2 MB | #####3 | 53% 2025-05-07T19:46:13.8635991Z 2025-05-07T19:46:13.8636140Z 2025-05-07T19:46:13.8636145Z 2025-05-07T19:46:13.8636155Z 2025-05-07T19:46:13.8636185Z 2025-05-07T19:46:13.8636189Z 2025-05-07T19:46:13.8815775Z cuda-nsight-12.8.55 | 113.2 MB | ####4 | 45%  2025-05-07T19:46:13.8818295Z 2025-05-07T19:46:13.8910973Z nsight-compute-2025. | 320.6 MB | #######1 | 72%  2025-05-07T19:46:13.8911434Z 2025-05-07T19:46:13.8911439Z 2025-05-07T19:46:13.8911443Z 2025-05-07T19:46:13.8911446Z 2025-05-07T19:46:13.8911450Z 2025-05-07T19:46:13.8911468Z 2025-05-07T19:46:13.8911473Z 2025-05-07T19:46:13.9228886Z cuda-nvvp-12.8.57 | 112.4 MB | 4 | 4%  2025-05-07T19:46:13.9229450Z 2025-05-07T19:46:13.9229455Z 2025-05-07T19:46:13.9229459Z 2025-05-07T19:46:13.9229463Z 2025-05-07T19:46:13.9229466Z 2025-05-07T19:46:13.9785875Z libnpp-12.3.3.65 | 130.6 MB | ####4 | 45%  2025-05-07T19:46:13.9912327Z libcublas-12.8.3.14 | 460.2 MB | #####4 | 55% 2025-05-07T19:46:13.9913122Z 2025-05-07T19:46:13.9913135Z 2025-05-07T19:46:13.9913147Z 2025-05-07T19:46:13.9913158Z 2025-05-07T19:46:13.9913169Z 2025-05-07T19:46:13.9913179Z 2025-05-07T19:46:13.9913189Z 2025-05-07T19:46:14.0050756Z cuda-nvvp-12.8.57 | 112.4 MB | 8 | 9%  2025-05-07T19:46:14.0051093Z 2025-05-07T19:46:14.0051098Z 2025-05-07T19:46:14.0051102Z 2025-05-07T19:46:14.0051106Z 2025-05-07T19:46:14.0051109Z 2025-05-07T19:46:14.0052245Z 2025-05-07T19:46:14.0361409Z cuda-nsight-12.8.55 | 113.2 MB | ##### | 50%  2025-05-07T19:46:14.0361755Z 2025-05-07T19:46:14.0604718Z nsight-compute-2025. | 320.6 MB | #######3 | 74%  2025-05-07T19:46:14.0605037Z 2025-05-07T19:46:14.0605042Z 2025-05-07T19:46:14.0605046Z 2025-05-07T19:46:14.0605049Z 2025-05-07T19:46:14.0605053Z 2025-05-07T19:46:14.0916526Z libnpp-12.3.3.65 | 130.6 MB | ####9 | 50%  2025-05-07T19:46:14.0917419Z 2025-05-07T19:46:14.0917441Z 2025-05-07T19:46:14.0917447Z 2025-05-07T19:46:14.0917452Z 2025-05-07T19:46:14.0917458Z 2025-05-07T19:46:14.0917464Z 2025-05-07T19:46:14.0917469Z 2025-05-07T19:46:14.1097813Z cuda-nvvp-12.8.57 | 112.4 MB | #2 | 13%  2025-05-07T19:46:14.1333323Z libcublas-12.8.3.14 | 460.2 MB | #####6 | 56% 2025-05-07T19:46:14.1333614Z 2025-05-07T19:46:14.1333618Z 2025-05-07T19:46:14.1333622Z 2025-05-07T19:46:14.1333639Z 2025-05-07T19:46:14.1333643Z 2025-05-07T19:46:14.1333647Z 2025-05-07T19:46:14.1851926Z cuda-nsight-12.8.55 | 113.2 MB | #####5 | 55%  2025-05-07T19:46:14.1853015Z 2025-05-07T19:46:14.1879005Z nsight-compute-2025. | 320.6 MB | #######5 | 75%  2025-05-07T19:46:14.1879498Z 2025-05-07T19:46:14.1879502Z 2025-05-07T19:46:14.1879506Z 2025-05-07T19:46:14.1879511Z 2025-05-07T19:46:14.1879515Z 2025-05-07T19:46:14.1920567Z libnpp-12.3.3.65 | 130.6 MB | #####3 | 54%  2025-05-07T19:46:14.1920983Z 2025-05-07T19:46:14.1920988Z 2025-05-07T19:46:14.1920992Z 2025-05-07T19:46:14.1920995Z 2025-05-07T19:46:14.1921000Z 2025-05-07T19:46:14.1921005Z 2025-05-07T19:46:14.1921008Z 2025-05-07T19:46:14.2384990Z cuda-nvvp-12.8.57 | 112.4 MB | #6 | 16%  2025-05-07T19:46:14.2507031Z libcublas-12.8.3.14 | 460.2 MB | #####7 | 58% 2025-05-07T19:46:14.2507839Z 2025-05-07T19:46:14.2507854Z 2025-05-07T19:46:14.2507865Z 2025-05-07T19:46:14.2507875Z 2025-05-07T19:46:14.2507906Z 2025-05-07T19:46:14.2507917Z 2025-05-07T19:46:14.2922917Z cuda-nsight-12.8.55 | 113.2 MB | ###### | 60%  2025-05-07T19:46:14.2923900Z 2025-05-07T19:46:14.2923913Z 2025-05-07T19:46:14.2923924Z 2025-05-07T19:46:14.2923935Z 2025-05-07T19:46:14.2923945Z 2025-05-07T19:46:14.2924679Z libnpp-12.3.3.65 | 130.6 MB | #####8 | 58%  2025-05-07T19:46:14.2925484Z 2025-05-07T19:46:14.2925495Z 2025-05-07T19:46:14.2925505Z 2025-05-07T19:46:14.2925515Z 2025-05-07T19:46:14.2925525Z 2025-05-07T19:46:14.2925535Z 2025-05-07T19:46:14.2925545Z 2025-05-07T19:46:14.3384580Z cuda-nvvp-12.8.57 | 112.4 MB | ##1 | 21%  2025-05-07T19:46:14.3512103Z libcublas-12.8.3.14 | 460.2 MB | #####8 | 59% 2025-05-07T19:46:14.3513056Z 2025-05-07T19:46:14.3513076Z 2025-05-07T19:46:14.3513134Z 2025-05-07T19:46:14.3513157Z 2025-05-07T19:46:14.3513193Z 2025-05-07T19:46:14.3513212Z 2025-05-07T19:46:14.3928615Z cuda-nsight-12.8.55 | 113.2 MB | ######4 | 65%  2025-05-07T19:46:14.3928957Z 2025-05-07T19:46:14.3928962Z 2025-05-07T19:46:14.3928966Z 2025-05-07T19:46:14.3929180Z 2025-05-07T19:46:14.3929186Z 2025-05-07T19:46:14.3929192Z 2025-05-07T19:46:14.3929197Z 2025-05-07T19:46:14.4014993Z cuda-nvvp-12.8.57 | 112.4 MB | ##5 | 26%  2025-05-07T19:46:14.4015462Z 2025-05-07T19:46:14.4134641Z nsight-compute-2025. | 320.6 MB | #######7 | 77%  2025-05-07T19:46:14.4135597Z 2025-05-07T19:46:14.4135611Z 2025-05-07T19:46:14.4135623Z 2025-05-07T19:46:14.4135634Z 2025-05-07T19:46:14.4135644Z 2025-05-07T19:46:14.4692758Z libnpp-12.3.3.65 | 130.6 MB | ######2 | 62%  2025-05-07T19:46:14.4799477Z libcublas-12.8.3.14 | 460.2 MB | #####9 | 60% 2025-05-07T19:46:14.4799897Z 2025-05-07T19:46:14.4799929Z 2025-05-07T19:46:14.4799933Z 2025-05-07T19:46:14.4799938Z 2025-05-07T19:46:14.4799943Z 2025-05-07T19:46:14.4799947Z 2025-05-07T19:46:14.5035498Z cuda-nsight-12.8.55 | 113.2 MB | ######9 | 69%  2025-05-07T19:46:14.5035876Z 2025-05-07T19:46:14.5035926Z 2025-05-07T19:46:14.5035949Z 2025-05-07T19:46:14.5035953Z 2025-05-07T19:46:14.5035957Z 2025-05-07T19:46:14.5035989Z 2025-05-07T19:46:14.5035993Z 2025-05-07T19:46:14.5144852Z cuda-nvvp-12.8.57 | 112.4 MB | ##9 | 30%  2025-05-07T19:46:14.5145779Z 2025-05-07T19:46:14.5323512Z nsight-compute-2025. | 320.6 MB | #######8 | 78%  2025-05-07T19:46:14.5324539Z 2025-05-07T19:46:14.5324552Z 2025-05-07T19:46:14.5324563Z 2025-05-07T19:46:14.5324574Z 2025-05-07T19:46:14.5324584Z 2025-05-07T19:46:14.5899425Z libnpp-12.3.3.65 | 130.6 MB | ######5 | 66%  2025-05-07T19:46:14.6009683Z libcublas-12.8.3.14 | 460.2 MB | ###### | 61% 2025-05-07T19:46:14.6010118Z 2025-05-07T19:46:14.6010331Z 2025-05-07T19:46:14.6010337Z 2025-05-07T19:46:14.6010341Z 2025-05-07T19:46:14.6010345Z 2025-05-07T19:46:14.6010358Z 2025-05-07T19:46:14.6076467Z cuda-nsight-12.8.55 | 113.2 MB | #######3 | 74%  2025-05-07T19:46:14.6076829Z 2025-05-07T19:46:14.6076838Z 2025-05-07T19:46:14.6076861Z 2025-05-07T19:46:14.6076867Z 2025-05-07T19:46:14.6076873Z 2025-05-07T19:46:14.6076880Z 2025-05-07T19:46:14.6076886Z 2025-05-07T19:46:14.6260470Z cuda-nvvp-12.8.57 | 112.4 MB | ###3 | 34%  2025-05-07T19:46:14.6261382Z 2025-05-07T19:46:14.6509887Z nsight-compute-2025. | 320.6 MB | #######9 | 80%  2025-05-07T19:46:14.6510171Z 2025-05-07T19:46:14.6510177Z 2025-05-07T19:46:14.6510182Z 2025-05-07T19:46:14.6510187Z 2025-05-07T19:46:14.6510193Z 2025-05-07T19:46:14.7003739Z libnpp-12.3.3.65 | 130.6 MB | ######9 | 70%  2025-05-07T19:46:14.7045219Z libcublas-12.8.3.14 | 460.2 MB | ######2 | 62% 2025-05-07T19:46:14.7046282Z 2025-05-07T19:46:14.7046306Z 2025-05-07T19:46:14.7046325Z 2025-05-07T19:46:14.7046345Z 2025-05-07T19:46:14.7046363Z 2025-05-07T19:46:14.7046385Z 2025-05-07T19:46:14.7092271Z cuda-nsight-12.8.55 | 113.2 MB | #######7 | 78%  2025-05-07T19:46:14.7092636Z 2025-05-07T19:46:14.7092660Z 2025-05-07T19:46:14.7092668Z 2025-05-07T19:46:14.7092673Z 2025-05-07T19:46:14.7092677Z 2025-05-07T19:46:14.7092680Z 2025-05-07T19:46:14.7092684Z 2025-05-07T19:46:14.7380219Z cuda-nvvp-12.8.57 | 112.4 MB | ###7 | 38%  2025-05-07T19:46:14.7380613Z 2025-05-07T19:46:14.7518496Z nsight-compute-2025. | 320.6 MB | ######## | 81%  2025-05-07T19:46:14.7519268Z 2025-05-07T19:46:14.7519303Z 2025-05-07T19:46:14.7519310Z 2025-05-07T19:46:14.7519316Z 2025-05-07T19:46:14.7519321Z 2025-05-07T19:46:14.8085806Z libnpp-12.3.3.65 | 130.6 MB | #######2 | 73%  2025-05-07T19:46:14.8100556Z libcublas-12.8.3.14 | 460.2 MB | ######3 | 63% 2025-05-07T19:46:14.8101333Z 2025-05-07T19:46:14.8101349Z 2025-05-07T19:46:14.8101360Z 2025-05-07T19:46:14.8101372Z 2025-05-07T19:46:14.8101383Z 2025-05-07T19:46:14.8101393Z 2025-05-07T19:46:14.8101404Z 2025-05-07T19:46:14.8403281Z cuda-nvvp-12.8.57 | 112.4 MB | ####2 | 42%  2025-05-07T19:46:14.8409209Z 2025-05-07T19:46:14.8550626Z nsight-compute-2025. | 320.6 MB | ########2 | 82%  2025-05-07T19:46:14.8551595Z 2025-05-07T19:46:14.8551615Z 2025-05-07T19:46:14.8551632Z 2025-05-07T19:46:14.8551649Z 2025-05-07T19:46:14.8551667Z 2025-05-07T19:46:14.8551684Z 2025-05-07T19:46:14.8571312Z cuda-nsight-12.8.55 | 113.2 MB | ########1 | 82%  2025-05-07T19:46:14.8572748Z 2025-05-07T19:46:14.8572761Z 2025-05-07T19:46:14.8572794Z 2025-05-07T19:46:14.8572805Z 2025-05-07T19:46:14.8572815Z 2025-05-07T19:46:14.9130689Z libnpp-12.3.3.65 | 130.6 MB | #######6 | 76%  2025-05-07T19:46:14.9165130Z libcublas-12.8.3.14 | 460.2 MB | ######4 | 64% 2025-05-07T19:46:14.9165958Z 2025-05-07T19:46:14.9165980Z 2025-05-07T19:46:14.9165999Z 2025-05-07T19:46:14.9166022Z 2025-05-07T19:46:14.9166041Z 2025-05-07T19:46:14.9166052Z 2025-05-07T19:46:14.9166062Z 2025-05-07T19:46:14.9507133Z cuda-nvvp-12.8.57 | 112.4 MB | ####6 | 46%  2025-05-07T19:46:14.9548174Z 2025-05-07T19:46:14.9549262Z nsight-compute-2025. | 320.6 MB | ########3 | 83%  2025-05-07T19:46:14.9550234Z 2025-05-07T19:46:14.9550309Z 2025-05-07T19:46:14.9550327Z 2025-05-07T19:46:14.9550346Z 2025-05-07T19:46:14.9550364Z 2025-05-07T19:46:14.9550381Z 2025-05-07T19:46:14.9577729Z cuda-nsight-12.8.55 | 113.2 MB | ########5 | 85%  2025-05-07T19:46:14.9578209Z 2025-05-07T19:46:14.9578215Z 2025-05-07T19:46:14.9578220Z 2025-05-07T19:46:14.9578244Z 2025-05-07T19:46:14.9578249Z 2025-05-07T19:46:15.0159903Z libnpp-12.3.3.65 | 130.6 MB | #######9 | 80%  2025-05-07T19:46:15.0236021Z libcublas-12.8.3.14 | 460.2 MB | ######4 | 65% 2025-05-07T19:46:15.0236380Z 2025-05-07T19:46:15.0236385Z 2025-05-07T19:46:15.0236389Z 2025-05-07T19:46:15.0236393Z 2025-05-07T19:46:15.0236396Z 2025-05-07T19:46:15.0236400Z 2025-05-07T19:46:15.0236592Z 2025-05-07T19:46:15.0510921Z cuda-nvvp-12.8.57 | 112.4 MB | ####9 | 50%  2025-05-07T19:46:15.0511340Z 2025-05-07T19:46:15.0571407Z nsight-compute-2025. | 320.6 MB | ########4 | 84%  2025-05-07T19:46:15.0571735Z 2025-05-07T19:46:15.0571739Z 2025-05-07T19:46:15.0571754Z 2025-05-07T19:46:15.0571758Z 2025-05-07T19:46:15.0571763Z 2025-05-07T19:46:15.0571767Z 2025-05-07T19:46:15.0601776Z cuda-nsight-12.8.55 | 113.2 MB | ########8 | 89%  2025-05-07T19:46:15.0602129Z 2025-05-07T19:46:15.0602134Z 2025-05-07T19:46:15.0602164Z 2025-05-07T19:46:15.0602169Z 2025-05-07T19:46:15.0602175Z 2025-05-07T19:46:15.1226266Z libnpp-12.3.3.65 | 130.6 MB | ########3 | 83%  2025-05-07T19:46:15.1279756Z libcublas-12.8.3.14 | 460.2 MB | ######5 | 66% 2025-05-07T19:46:15.1280699Z 2025-05-07T19:46:15.1280718Z 2025-05-07T19:46:15.1280736Z 2025-05-07T19:46:15.1280776Z 2025-05-07T19:46:15.1280794Z 2025-05-07T19:46:15.1280812Z 2025-05-07T19:46:15.1280829Z 2025-05-07T19:46:15.1512846Z cuda-nvvp-12.8.57 | 112.4 MB | #####3 | 54%  2025-05-07T19:46:15.1514388Z 2025-05-07T19:46:15.1572719Z nsight-compute-2025. | 320.6 MB | ########5 | 86%  2025-05-07T19:46:15.1573034Z 2025-05-07T19:46:15.1573041Z 2025-05-07T19:46:15.1573062Z 2025-05-07T19:46:15.1573067Z 2025-05-07T19:46:15.1573074Z 2025-05-07T19:46:15.1573082Z 2025-05-07T19:46:15.1658174Z cuda-nsight-12.8.55 | 113.2 MB | #########2 | 93%  2025-05-07T19:46:15.1658507Z 2025-05-07T19:46:15.1658513Z 2025-05-07T19:46:15.1658517Z 2025-05-07T19:46:15.1658533Z 2025-05-07T19:46:15.1658537Z 2025-05-07T19:46:15.2278807Z libnpp-12.3.3.65 | 130.6 MB | ########6 | 87%  2025-05-07T19:46:15.2305980Z libcublas-12.8.3.14 | 460.2 MB | ######6 | 67% 2025-05-07T19:46:15.2306308Z 2025-05-07T19:46:15.2306314Z 2025-05-07T19:46:15.2306319Z 2025-05-07T19:46:15.2306485Z 2025-05-07T19:46:15.2306489Z 2025-05-07T19:46:15.2306493Z 2025-05-07T19:46:15.2306498Z 2025-05-07T19:46:15.2574103Z cuda-nvvp-12.8.57 | 112.4 MB | #####7 | 58%  2025-05-07T19:46:15.2574666Z 2025-05-07T19:46:15.2574686Z 2025-05-07T19:46:15.2574690Z 2025-05-07T19:46:15.2574693Z 2025-05-07T19:46:15.2574697Z 2025-05-07T19:46:15.2574700Z 2025-05-07T19:46:15.2632386Z cuda-nsight-12.8.55 | 113.2 MB | #########6 | 96%  2025-05-07T19:46:15.2632716Z 2025-05-07T19:46:15.2664159Z nsight-compute-2025. | 320.6 MB | ########6 | 87%  2025-05-07T19:46:15.2664460Z 2025-05-07T19:46:15.2664464Z 2025-05-07T19:46:15.2664469Z 2025-05-07T19:46:15.2664472Z 2025-05-07T19:46:15.2664477Z 2025-05-07T19:46:15.3282899Z libnpp-12.3.3.65 | 130.6 MB | ########9 | 90%  2025-05-07T19:46:15.3362204Z libcublas-12.8.3.14 | 460.2 MB | ######7 | 68% 2025-05-07T19:46:15.3363277Z 2025-05-07T19:46:15.3363291Z 2025-05-07T19:46:15.3363302Z 2025-05-07T19:46:15.3363312Z 2025-05-07T19:46:15.3363323Z 2025-05-07T19:46:15.3363334Z 2025-05-07T19:46:15.3363345Z 2025-05-07T19:46:15.3666790Z cuda-nvvp-12.8.57 | 112.4 MB | ######1 | 61%  2025-05-07T19:46:15.3668092Z 2025-05-07T19:46:15.3668106Z 2025-05-07T19:46:15.3668121Z 2025-05-07T19:46:15.3668131Z 2025-05-07T19:46:15.3668141Z 2025-05-07T19:46:15.3687101Z libnpp-12.3.3.65 | 130.6 MB | #########3 | 93%  2025-05-07T19:46:15.3687397Z 2025-05-07T19:46:15.4282039Z nsight-compute-2025. | 320.6 MB | ########8 | 88%  2025-05-07T19:46:15.4363948Z libcublas-12.8.3.14 | 460.2 MB | ######8 | 69% 2025-05-07T19:46:15.4364238Z 2025-05-07T19:46:15.4364243Z 2025-05-07T19:46:15.4364247Z 2025-05-07T19:46:15.4364254Z 2025-05-07T19:46:15.4364257Z 2025-05-07T19:46:15.4364463Z 2025-05-07T19:46:15.4364469Z 2025-05-07T19:46:15.4668175Z cuda-nvvp-12.8.57 | 112.4 MB | ######5 | 66%  2025-05-07T19:46:15.4668849Z 2025-05-07T19:46:15.4668853Z 2025-05-07T19:46:15.4668857Z 2025-05-07T19:46:15.4668860Z 2025-05-07T19:46:15.4668864Z 2025-05-07T19:46:15.4689715Z libnpp-12.3.3.65 | 130.6 MB | #########7 | 97%  2025-05-07T19:46:15.4690007Z 2025-05-07T19:46:15.5365588Z nsight-compute-2025. | 320.6 MB | ########9 | 90%  2025-05-07T19:46:15.5365904Z 2025-05-07T19:46:15.5365908Z 2025-05-07T19:46:15.5365912Z 2025-05-07T19:46:15.5365915Z 2025-05-07T19:46:15.5365919Z 2025-05-07T19:46:15.5365923Z 2025-05-07T19:46:15.5365929Z 2025-05-07T19:46:15.5504875Z cuda-nvvp-12.8.57 | 112.4 MB | ####### | 71%  2025-05-07T19:46:15.6108083Z libcublas-12.8.3.14 | 460.2 MB | ######9 | 70% 2025-05-07T19:46:15.6108367Z 2025-05-07T19:46:15.6365615Z nsight-compute-2025. | 320.6 MB | ######### | 91%  2025-05-07T19:46:15.6365931Z 2025-05-07T19:46:15.6365936Z 2025-05-07T19:46:15.6365940Z 2025-05-07T19:46:15.6365944Z 2025-05-07T19:46:15.6365949Z 2025-05-07T19:46:15.6365952Z 2025-05-07T19:46:15.6365956Z 2025-05-07T19:46:15.6505299Z cuda-nvvp-12.8.57 | 112.4 MB | #######8 | 79%  2025-05-07T19:46:15.7109234Z libcublas-12.8.3.14 | 460.2 MB | #######1 | 72% 2025-05-07T19:46:15.7109531Z 2025-05-07T19:46:15.7366839Z nsight-compute-2025. | 320.6 MB | #########2 | 92%  2025-05-07T19:46:15.7367271Z 2025-05-07T19:46:15.7367276Z 2025-05-07T19:46:15.7367289Z 2025-05-07T19:46:15.7367293Z 2025-05-07T19:46:15.7367297Z 2025-05-07T19:46:15.7367300Z 2025-05-07T19:46:15.7368369Z 2025-05-07T19:46:15.7506128Z cuda-nvvp-12.8.57 | 112.4 MB | ########4 | 84%  2025-05-07T19:46:15.8109223Z libcublas-12.8.3.14 | 460.2 MB | #######2 | 73% 2025-05-07T19:46:15.8110167Z 2025-05-07T19:46:15.8386884Z nsight-compute-2025. | 320.6 MB | #########4 | 94%  2025-05-07T19:46:15.8387924Z 2025-05-07T19:46:15.8387932Z 2025-05-07T19:46:15.8387955Z 2025-05-07T19:46:15.8387962Z 2025-05-07T19:46:15.8387967Z 2025-05-07T19:46:15.8387974Z 2025-05-07T19:46:15.8387980Z 2025-05-07T19:46:15.8506302Z cuda-nvvp-12.8.57 | 112.4 MB | ########9 | 90%  2025-05-07T19:46:15.9112127Z libcublas-12.8.3.14 | 460.2 MB | #######4 | 74% 2025-05-07T19:46:15.9112414Z 2025-05-07T19:46:15.9390531Z nsight-compute-2025. | 320.6 MB | #########5 | 96%  2025-05-07T19:46:15.9390832Z 2025-05-07T19:46:15.9390967Z 2025-05-07T19:46:15.9390973Z 2025-05-07T19:46:15.9391005Z 2025-05-07T19:46:15.9391013Z 2025-05-07T19:46:15.9391043Z 2025-05-07T19:46:15.9391048Z 2025-05-07T19:46:15.9506544Z cuda-nvvp-12.8.57 | 112.4 MB | #########4 | 95%  2025-05-07T19:46:16.0112515Z libcublas-12.8.3.14 | 460.2 MB | #######5 | 76% 2025-05-07T19:46:16.0112876Z 2025-05-07T19:46:16.0508990Z nsight-compute-2025. | 320.6 MB | #########7 | 98%  2025-05-07T19:46:16.1115384Z libcublas-12.8.3.14 | 460.2 MB | #######7 | 77% 2025-05-07T19:46:16.1116423Z 2025-05-07T19:46:16.1510469Z nsight-compute-2025. | 320.6 MB | #########9 | 100%  2025-05-07T19:46:16.2510561Z libcublas-12.8.3.14 | 460.2 MB | #######8 | 79% 2025-05-07T19:46:16.3511140Z libcublas-12.8.3.14 | 460.2 MB | ########1 | 81% 2025-05-07T19:46:16.4512688Z libcublas-12.8.3.14 | 460.2 MB | ########3 | 83% 2025-05-07T19:46:16.5517234Z libcublas-12.8.3.14 | 460.2 MB | ########5 | 85% 2025-05-07T19:46:16.6520486Z libcublas-12.8.3.14 | 460.2 MB | ########7 | 88% 2025-05-07T19:46:16.7521472Z libcublas-12.8.3.14 | 460.2 MB | ########9 | 90% 2025-05-07T19:46:16.8522033Z libcublas-12.8.3.14 | 460.2 MB | #########1 | 92% 2025-05-07T19:46:16.9523166Z libcublas-12.8.3.14 | 460.2 MB | #########4 | 94% 2025-05-07T19:46:17.0523522Z libcublas-12.8.3.14 | 460.2 MB | #########6 | 96% 2025-05-07T19:46:17.0660443Z libcublas-12.8.3.14 | 460.2 MB | #########8 | 99% 2025-05-07T19:46:17.0660750Z 2025-05-07T19:46:17.0660755Z 2025-05-07T19:46:17.0660759Z 2025-05-07T19:46:17.0660763Z 2025-05-07T19:46:17.0878976Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:46:17.0879872Z 2025-05-07T19:46:17.0879888Z 2025-05-07T19:46:17.0879899Z 2025-05-07T19:46:17.0879942Z 2025-05-07T19:46:17.0879953Z 2025-05-07T19:46:17.0879964Z 2025-05-07T19:46:17.0880760Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:17.0881627Z 2025-05-07T19:46:17.0881637Z 2025-05-07T19:46:17.0881648Z 2025-05-07T19:46:17.0881658Z 2025-05-07T19:46:17.0881668Z 2025-05-07T19:46:17.0881679Z 2025-05-07T19:46:17.1404546Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:17.1404862Z 2025-05-07T19:46:17.1404867Z 2025-05-07T19:46:17.1404871Z 2025-05-07T19:46:17.1404876Z 2025-05-07T19:46:17.1404879Z 2025-05-07T19:46:17.1404882Z 2025-05-07T19:46:17.1404897Z 2025-05-07T19:46:17.1404900Z 2025-05-07T19:46:17.2439563Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:46:17.2439894Z 2025-05-07T19:46:17.2439898Z 2025-05-07T19:46:17.2439902Z 2025-05-07T19:46:17.2439906Z 2025-05-07T19:46:17.2439910Z 2025-05-07T19:46:17.2439913Z 2025-05-07T19:46:17.2439917Z 2025-05-07T19:46:17.2439935Z 2025-05-07T19:46:17.3457578Z cuda-nvrtc-12.8.61 | 63.1 MB | #2 | 12%  2025-05-07T19:46:17.3458156Z 2025-05-07T19:46:17.3458164Z 2025-05-07T19:46:17.3458170Z 2025-05-07T19:46:17.3458177Z 2025-05-07T19:46:17.3458183Z 2025-05-07T19:46:17.3458189Z 2025-05-07T19:46:17.3458195Z 2025-05-07T19:46:17.3458202Z 2025-05-07T19:46:17.4051059Z cuda-nvrtc-12.8.61 | 63.1 MB | #9 | 20%  2025-05-07T19:46:17.4051435Z 2025-05-07T19:46:17.4051440Z 2025-05-07T19:46:17.4051445Z 2025-05-07T19:46:17.4051448Z 2025-05-07T19:46:17.4051551Z 2025-05-07T19:46:17.4150015Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:17.4150383Z 2025-05-07T19:46:17.4150389Z 2025-05-07T19:46:17.4150393Z 2025-05-07T19:46:17.4150396Z 2025-05-07T19:46:17.4150400Z 2025-05-07T19:46:17.4150403Z 2025-05-07T19:46:17.4150407Z 2025-05-07T19:46:17.4388379Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:17.4389028Z 2025-05-07T19:46:17.4389033Z 2025-05-07T19:46:17.4389036Z 2025-05-07T19:46:17.4389040Z 2025-05-07T19:46:17.4389044Z 2025-05-07T19:46:17.4389047Z 2025-05-07T19:46:17.4389051Z 2025-05-07T19:46:17.4389054Z 2025-05-07T19:46:17.4389057Z 2025-05-07T19:46:17.4458737Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:46:17.4459082Z 2025-05-07T19:46:17.4459088Z 2025-05-07T19:46:17.4459091Z 2025-05-07T19:46:17.4459095Z 2025-05-07T19:46:17.4459098Z 2025-05-07T19:46:17.4459101Z 2025-05-07T19:46:17.4459113Z 2025-05-07T19:46:17.4459117Z 2025-05-07T19:46:17.4662274Z cuda-nvrtc-12.8.61 | 63.1 MB | ### | 31%  2025-05-07T19:46:17.4663406Z 2025-05-07T19:46:17.4663420Z 2025-05-07T19:46:17.4663432Z 2025-05-07T19:46:17.4663469Z 2025-05-07T19:46:17.4663482Z 2025-05-07T19:46:17.4663494Z 2025-05-07T19:46:17.4663504Z 2025-05-07T19:46:17.4663516Z 2025-05-07T19:46:17.4663526Z 2025-05-07T19:46:17.4663595Z 2025-05-07T19:46:17.5390615Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:17.5390974Z 2025-05-07T19:46:17.5390978Z 2025-05-07T19:46:17.5390982Z 2025-05-07T19:46:17.5390988Z 2025-05-07T19:46:17.5390995Z 2025-05-07T19:46:17.5391002Z 2025-05-07T19:46:17.5391008Z 2025-05-07T19:46:17.5391014Z 2025-05-07T19:46:17.5391031Z 2025-05-07T19:46:17.5461898Z libcurand-10.3.9.55 | 43.6 MB | #4 | 14%  2025-05-07T19:46:17.5463029Z 2025-05-07T19:46:17.5463050Z 2025-05-07T19:46:17.5463070Z 2025-05-07T19:46:17.5463089Z 2025-05-07T19:46:17.5463107Z 2025-05-07T19:46:17.5463125Z 2025-05-07T19:46:17.5463605Z 2025-05-07T19:46:17.5463647Z 2025-05-07T19:46:17.5664907Z cuda-nvrtc-12.8.61 | 63.1 MB | ####1 | 41%  2025-05-07T19:46:17.5665230Z 2025-05-07T19:46:17.5665234Z 2025-05-07T19:46:17.5665237Z 2025-05-07T19:46:17.5665242Z 2025-05-07T19:46:17.5665247Z 2025-05-07T19:46:17.5665268Z 2025-05-07T19:46:17.5665285Z 2025-05-07T19:46:17.5665289Z 2025-05-07T19:46:17.5665292Z 2025-05-07T19:46:17.5665296Z 2025-05-07T19:46:17.6393364Z gds-tools-1.13.0.11 | 37.9 MB | #6 | 16%  2025-05-07T19:46:17.6394573Z 2025-05-07T19:46:17.6394586Z 2025-05-07T19:46:17.6394596Z 2025-05-07T19:46:17.6394607Z 2025-05-07T19:46:17.6394641Z 2025-05-07T19:46:17.6394652Z 2025-05-07T19:46:17.6394662Z 2025-05-07T19:46:17.6394672Z 2025-05-07T19:46:17.6394683Z 2025-05-07T19:46:17.6462970Z libcurand-10.3.9.55 | 43.6 MB | ##8 | 29%  2025-05-07T19:46:17.6463313Z 2025-05-07T19:46:17.6463318Z 2025-05-07T19:46:17.6463342Z 2025-05-07T19:46:17.6463346Z 2025-05-07T19:46:17.6463364Z 2025-05-07T19:46:17.6463368Z 2025-05-07T19:46:17.6463371Z 2025-05-07T19:46:17.6463626Z 2025-05-07T19:46:17.6665946Z cuda-nvrtc-12.8.61 | 63.1 MB | #####1 | 51%  2025-05-07T19:46:17.6666253Z 2025-05-07T19:46:17.6666278Z 2025-05-07T19:46:17.6666282Z 2025-05-07T19:46:17.6666300Z 2025-05-07T19:46:17.6666306Z 2025-05-07T19:46:17.6666309Z 2025-05-07T19:46:17.6666313Z 2025-05-07T19:46:17.6666317Z 2025-05-07T19:46:17.6666320Z 2025-05-07T19:46:17.6666382Z 2025-05-07T19:46:17.7394506Z gds-tools-1.13.0.11 | 37.9 MB | ###3 | 33%  2025-05-07T19:46:17.7394838Z 2025-05-07T19:46:17.7394845Z 2025-05-07T19:46:17.7394852Z 2025-05-07T19:46:17.7394858Z 2025-05-07T19:46:17.7394864Z 2025-05-07T19:46:17.7394887Z 2025-05-07T19:46:17.7394896Z 2025-05-07T19:46:17.7394901Z 2025-05-07T19:46:17.7394909Z 2025-05-07T19:46:17.7466217Z libcurand-10.3.9.55 | 43.6 MB | ####3 | 43%  2025-05-07T19:46:17.7467685Z 2025-05-07T19:46:17.7467705Z 2025-05-07T19:46:17.7467724Z 2025-05-07T19:46:17.7467744Z 2025-05-07T19:46:17.7467789Z 2025-05-07T19:46:17.7467809Z 2025-05-07T19:46:17.7467827Z 2025-05-07T19:46:17.7467845Z 2025-05-07T19:46:17.7667421Z cuda-nvrtc-12.8.61 | 63.1 MB | ######1 | 61%  2025-05-07T19:46:17.7668023Z 2025-05-07T19:46:17.7668029Z 2025-05-07T19:46:17.7668035Z 2025-05-07T19:46:17.7668041Z 2025-05-07T19:46:17.7668046Z 2025-05-07T19:46:17.7668069Z 2025-05-07T19:46:17.7668093Z 2025-05-07T19:46:17.7668098Z 2025-05-07T19:46:17.7668103Z 2025-05-07T19:46:17.7668109Z 2025-05-07T19:46:17.8395368Z gds-tools-1.13.0.11 | 37.9 MB | ####9 | 50%  2025-05-07T19:46:17.8396595Z 2025-05-07T19:46:17.8396617Z 2025-05-07T19:46:17.8396639Z 2025-05-07T19:46:17.8396661Z 2025-05-07T19:46:17.8396677Z 2025-05-07T19:46:17.8396699Z 2025-05-07T19:46:17.8396717Z 2025-05-07T19:46:17.8396783Z 2025-05-07T19:46:17.8396800Z 2025-05-07T19:46:17.8463795Z libcurand-10.3.9.55 | 43.6 MB | #####8 | 58%  2025-05-07T19:46:17.8464122Z 2025-05-07T19:46:17.8464128Z 2025-05-07T19:46:17.8465416Z 2025-05-07T19:46:17.8668685Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:17.8669077Z 2025-05-07T19:46:17.8669288Z 2025-05-07T19:46:17.8669296Z 2025-05-07T19:46:17.8669303Z 2025-05-07T19:46:17.8669307Z 2025-05-07T19:46:17.8669312Z 2025-05-07T19:46:17.8669317Z 2025-05-07T19:46:17.8669322Z 2025-05-07T19:46:17.8669328Z 2025-05-07T19:46:17.8669334Z 2025-05-07T19:46:17.9394931Z gds-tools-1.13.0.11 | 37.9 MB | ######6 | 67%  2025-05-07T19:46:17.9395353Z 2025-05-07T19:46:17.9395358Z 2025-05-07T19:46:17.9395363Z 2025-05-07T19:46:17.9395368Z 2025-05-07T19:46:17.9395373Z 2025-05-07T19:46:17.9395378Z 2025-05-07T19:46:17.9395382Z 2025-05-07T19:46:17.9395387Z 2025-05-07T19:46:17.9395393Z 2025-05-07T19:46:17.9672814Z libcurand-10.3.9.55 | 43.6 MB | #######5 | 76%  2025-05-07T19:46:17.9673239Z 2025-05-07T19:46:17.9673245Z 2025-05-07T19:46:17.9673249Z 2025-05-07T19:46:17.9673253Z 2025-05-07T19:46:17.9673258Z 2025-05-07T19:46:17.9673261Z 2025-05-07T19:46:17.9673266Z 2025-05-07T19:46:17.9673284Z 2025-05-07T19:46:17.9673288Z 2025-05-07T19:46:17.9673292Z 2025-05-07T19:46:18.0395980Z gds-tools-1.13.0.11 | 37.9 MB | ########7 | 87%  2025-05-07T19:46:18.0396336Z 2025-05-07T19:46:18.0396341Z 2025-05-07T19:46:18.0396345Z 2025-05-07T19:46:18.0396348Z 2025-05-07T19:46:18.0396352Z 2025-05-07T19:46:18.0396355Z 2025-05-07T19:46:18.0396359Z 2025-05-07T19:46:18.0396362Z 2025-05-07T19:46:18.0396366Z 2025-05-07T19:46:18.0870048Z libcurand-10.3.9.55 | 43.6 MB | #########4 | 94%  2025-05-07T19:46:18.0870429Z 2025-05-07T19:46:18.0870438Z 2025-05-07T19:46:18.0870444Z 2025-05-07T19:46:18.0870450Z 2025-05-07T19:46:18.0870493Z 2025-05-07T19:46:18.0870498Z 2025-05-07T19:46:18.0870503Z 2025-05-07T19:46:18.0870508Z 2025-05-07T19:46:18.1908943Z cuda-nvrtc-12.8.61 | 63.1 MB | ####### | 71%  2025-05-07T19:46:18.1909320Z 2025-05-07T19:46:18.1909325Z 2025-05-07T19:46:18.1909331Z 2025-05-07T19:46:18.1909359Z 2025-05-07T19:46:18.1909363Z 2025-05-07T19:46:18.1909368Z 2025-05-07T19:46:18.1909373Z 2025-05-07T19:46:18.1909377Z 2025-05-07T19:46:18.2940739Z cuda-nvrtc-12.8.61 | 63.1 MB | ########6 | 86%  2025-05-07T19:46:18.2941095Z 2025-05-07T19:46:18.2941100Z 2025-05-07T19:46:18.2941104Z 2025-05-07T19:46:18.2941108Z 2025-05-07T19:46:18.2941112Z 2025-05-07T19:46:18.2941116Z 2025-05-07T19:46:18.2941120Z 2025-05-07T19:46:18.2941124Z 2025-05-07T19:46:18.3211239Z cuda-nvrtc-12.8.61 | 63.1 MB | #########6 | 96%  2025-05-07T19:46:18.3211543Z 2025-05-07T19:46:18.3211564Z 2025-05-07T19:46:18.3211569Z 2025-05-07T19:46:18.3211597Z 2025-05-07T19:46:18.3211601Z 2025-05-07T19:46:18.3211615Z 2025-05-07T19:46:18.5456579Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:18.5457148Z 2025-05-07T19:46:18.5457156Z 2025-05-07T19:46:18.5457180Z 2025-05-07T19:46:18.5457187Z 2025-05-07T19:46:18.5457457Z 2025-05-07T19:46:18.5457461Z 2025-05-07T19:46:18.5457465Z 2025-05-07T19:46:18.5457469Z 2025-05-07T19:46:18.5457473Z 2025-05-07T19:46:18.5457476Z 2025-05-07T19:46:18.5916314Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:18.5916643Z 2025-05-07T19:46:18.5916900Z 2025-05-07T19:46:18.5916914Z 2025-05-07T19:46:18.5916921Z 2025-05-07T19:46:18.5916926Z 2025-05-07T19:46:18.5916933Z 2025-05-07T19:46:18.5916939Z 2025-05-07T19:46:18.5916947Z 2025-05-07T19:46:18.5916952Z 2025-05-07T19:46:18.6027683Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:18.6028109Z 2025-05-07T19:46:18.6028154Z 2025-05-07T19:46:18.6028159Z 2025-05-07T19:46:18.6028163Z 2025-05-07T19:46:18.6028168Z 2025-05-07T19:46:18.6028173Z 2025-05-07T19:46:18.6028177Z 2025-05-07T19:46:18.6028181Z 2025-05-07T19:46:18.6028184Z 2025-05-07T19:46:18.6028188Z 2025-05-07T19:46:18.6028193Z 2025-05-07T19:46:18.6359208Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:18.6359617Z 2025-05-07T19:46:18.6359622Z 2025-05-07T19:46:18.6359626Z 2025-05-07T19:46:18.6359629Z 2025-05-07T19:46:18.6359632Z 2025-05-07T19:46:18.6359637Z 2025-05-07T19:46:18.6359640Z 2025-05-07T19:46:18.6359644Z 2025-05-07T19:46:18.6359647Z 2025-05-07T19:46:18.6359651Z 2025-05-07T19:46:18.6359654Z 2025-05-07T19:46:18.6359657Z 2025-05-07T19:46:18.6476112Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:18.6476460Z 2025-05-07T19:46:18.6476467Z 2025-05-07T19:46:18.7027673Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:18.7028177Z 2025-05-07T19:46:18.7028183Z 2025-05-07T19:46:18.7028188Z 2025-05-07T19:46:18.7028192Z 2025-05-07T19:46:18.7028197Z 2025-05-07T19:46:18.7028202Z 2025-05-07T19:46:18.7028207Z 2025-05-07T19:46:18.7028212Z 2025-05-07T19:46:18.7028215Z 2025-05-07T19:46:18.7028218Z 2025-05-07T19:46:18.7028244Z 2025-05-07T19:46:18.7360053Z libnvjitlink-12.8.61 | 28.7 MB | ##1 | 22%  2025-05-07T19:46:18.7360434Z 2025-05-07T19:46:18.7360438Z 2025-05-07T19:46:18.7360441Z 2025-05-07T19:46:18.7360446Z 2025-05-07T19:46:18.7360450Z 2025-05-07T19:46:18.7360453Z 2025-05-07T19:46:18.7360456Z 2025-05-07T19:46:18.7360461Z 2025-05-07T19:46:18.7360464Z 2025-05-07T19:46:18.7360469Z 2025-05-07T19:46:18.7360472Z 2025-05-07T19:46:18.7360491Z 2025-05-07T19:46:18.8031508Z cuda-nvcc-tools-12.8 | 24.5 MB | ##5 | 26%  2025-05-07T19:46:18.8031895Z 2025-05-07T19:46:18.8031900Z 2025-05-07T19:46:18.8031905Z 2025-05-07T19:46:18.8031925Z 2025-05-07T19:46:18.8031929Z 2025-05-07T19:46:18.8031933Z 2025-05-07T19:46:18.8031951Z 2025-05-07T19:46:18.8031955Z 2025-05-07T19:46:18.8031959Z 2025-05-07T19:46:18.8031962Z 2025-05-07T19:46:18.8031966Z 2025-05-07T19:46:18.8363233Z libnvjitlink-12.8.61 | 28.7 MB | ####6 | 47%  2025-05-07T19:46:18.8363594Z 2025-05-07T19:46:18.8363602Z 2025-05-07T19:46:18.8363608Z 2025-05-07T19:46:18.8363630Z 2025-05-07T19:46:18.8363636Z 2025-05-07T19:46:18.8363642Z 2025-05-07T19:46:18.8363649Z 2025-05-07T19:46:18.8363654Z 2025-05-07T19:46:18.8363658Z 2025-05-07T19:46:18.8363662Z 2025-05-07T19:46:18.8363665Z 2025-05-07T19:46:18.9032732Z 2025-05-07T19:46:18.9033308Z cuda-nvcc-tools-12.8 | 24.5 MB | #####7 | 57%  2025-05-07T19:46:18.9033670Z 2025-05-07T19:46:18.9033675Z 2025-05-07T19:46:18.9033679Z 2025-05-07T19:46:18.9033684Z 2025-05-07T19:46:18.9033689Z 2025-05-07T19:46:18.9033694Z 2025-05-07T19:46:18.9033720Z 2025-05-07T19:46:18.9033725Z 2025-05-07T19:46:18.9033729Z 2025-05-07T19:46:18.9033736Z 2025-05-07T19:46:18.9033741Z 2025-05-07T19:46:18.9363419Z libnvjitlink-12.8.61 | 28.7 MB | #######1 | 72%  2025-05-07T19:46:18.9363855Z 2025-05-07T19:46:18.9363863Z 2025-05-07T19:46:18.9364087Z 2025-05-07T19:46:18.9364094Z 2025-05-07T19:46:18.9364101Z 2025-05-07T19:46:18.9364107Z 2025-05-07T19:46:18.9364114Z 2025-05-07T19:46:18.9364120Z 2025-05-07T19:46:18.9364126Z 2025-05-07T19:46:18.9364133Z 2025-05-07T19:46:18.9364139Z 2025-05-07T19:46:18.9364145Z 2025-05-07T19:46:19.0033180Z cuda-nvcc-tools-12.8 | 24.5 MB | ########8 | 89%  2025-05-07T19:46:19.0033538Z 2025-05-07T19:46:19.0033553Z 2025-05-07T19:46:19.0033609Z 2025-05-07T19:46:19.0033619Z 2025-05-07T19:46:19.0033717Z 2025-05-07T19:46:19.0033726Z 2025-05-07T19:46:19.0033731Z 2025-05-07T19:46:19.0033737Z 2025-05-07T19:46:19.0033742Z 2025-05-07T19:46:19.0033746Z 2025-05-07T19:46:19.0033771Z 2025-05-07T19:46:19.1831304Z libnvjitlink-12.8.61 | 28.7 MB | #########9 | 99%  2025-05-07T19:46:19.1831807Z 2025-05-07T19:46:19.1831814Z 2025-05-07T19:46:19.1831838Z 2025-05-07T19:46:19.1831843Z 2025-05-07T19:46:19.1831848Z 2025-05-07T19:46:19.1831887Z 2025-05-07T19:46:19.1831890Z 2025-05-07T19:46:19.1831894Z 2025-05-07T19:46:19.2446216Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:19.2446609Z 2025-05-07T19:46:19.2446615Z 2025-05-07T19:46:19.2446640Z 2025-05-07T19:46:19.2446644Z 2025-05-07T19:46:19.2446647Z 2025-05-07T19:46:19.2446652Z 2025-05-07T19:46:19.2446658Z 2025-05-07T19:46:19.2446664Z 2025-05-07T19:46:19.2446670Z 2025-05-07T19:46:19.2446676Z 2025-05-07T19:46:19.2446682Z 2025-05-07T19:46:19.2446687Z 2025-05-07T19:46:19.2446692Z 2025-05-07T19:46:19.2622974Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:19.2623608Z 2025-05-07T19:46:19.2623615Z 2025-05-07T19:46:19.2623620Z 2025-05-07T19:46:19.2623624Z 2025-05-07T19:46:19.2623629Z 2025-05-07T19:46:19.2623633Z 2025-05-07T19:46:19.2623638Z 2025-05-07T19:46:19.2623643Z 2025-05-07T19:46:19.2623646Z 2025-05-07T19:46:19.2623651Z 2025-05-07T19:46:19.2623678Z 2025-05-07T19:46:19.2623700Z 2025-05-07T19:46:19.2914482Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:19.2914861Z 2025-05-07T19:46:19.2914867Z 2025-05-07T19:46:19.2914875Z 2025-05-07T19:46:19.2914880Z 2025-05-07T19:46:19.2914884Z 2025-05-07T19:46:19.2914909Z 2025-05-07T19:46:19.2914913Z 2025-05-07T19:46:19.2914916Z 2025-05-07T19:46:19.2914921Z 2025-05-07T19:46:19.2914925Z 2025-05-07T19:46:19.2994039Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:19.2994389Z 2025-05-07T19:46:19.2994394Z 2025-05-07T19:46:19.2994422Z 2025-05-07T19:46:19.2994426Z 2025-05-07T19:46:19.2994429Z 2025-05-07T19:46:19.2994455Z 2025-05-07T19:46:19.2994460Z 2025-05-07T19:46:19.2994463Z 2025-05-07T19:46:19.2994468Z 2025-05-07T19:46:19.2994472Z 2025-05-07T19:46:19.2994476Z 2025-05-07T19:46:19.2994479Z 2025-05-07T19:46:19.2994483Z 2025-05-07T19:46:19.2994486Z 2025-05-07T19:46:19.3446804Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:19.3447232Z 2025-05-07T19:46:19.3447237Z 2025-05-07T19:46:19.3447241Z 2025-05-07T19:46:19.3447244Z 2025-05-07T19:46:19.3447248Z 2025-05-07T19:46:19.3447251Z 2025-05-07T19:46:19.3447256Z 2025-05-07T19:46:19.3447259Z 2025-05-07T19:46:19.3447263Z 2025-05-07T19:46:19.3447266Z 2025-05-07T19:46:19.3447270Z 2025-05-07T19:46:19.3447273Z 2025-05-07T19:46:19.3447277Z 2025-05-07T19:46:19.3640566Z cuda-nvvm-tools-12.8 | 23.5 MB | ### | 30%  2025-05-07T19:46:19.3641764Z 2025-05-07T19:46:19.3641807Z 2025-05-07T19:46:19.3641818Z 2025-05-07T19:46:19.3641829Z 2025-05-07T19:46:19.3641879Z 2025-05-07T19:46:19.3641891Z 2025-05-07T19:46:19.3641903Z 2025-05-07T19:46:19.3641914Z 2025-05-07T19:46:19.3641925Z 2025-05-07T19:46:19.3641937Z 2025-05-07T19:46:19.3641948Z 2025-05-07T19:46:19.3997170Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:19.3997787Z 2025-05-07T19:46:19.3997792Z 2025-05-07T19:46:19.3997796Z 2025-05-07T19:46:19.3997799Z 2025-05-07T19:46:19.3997802Z 2025-05-07T19:46:19.3997806Z 2025-05-07T19:46:19.3997809Z 2025-05-07T19:46:19.3997812Z 2025-05-07T19:46:19.3997816Z 2025-05-07T19:46:19.3997819Z 2025-05-07T19:46:19.3997823Z 2025-05-07T19:46:19.3997826Z 2025-05-07T19:46:19.3997830Z 2025-05-07T19:46:19.3997833Z 2025-05-07T19:46:19.4045438Z cuda-nvvm-impl-12.8. | 20.8 MB | ###5 | 36%  2025-05-07T19:46:19.4045815Z 2025-05-07T19:46:19.4045821Z 2025-05-07T19:46:19.4045825Z 2025-05-07T19:46:19.4045830Z 2025-05-07T19:46:19.4045835Z 2025-05-07T19:46:19.4045853Z 2025-05-07T19:46:19.4045856Z 2025-05-07T19:46:19.4045860Z 2025-05-07T19:46:19.4045863Z 2025-05-07T19:46:19.4045880Z 2025-05-07T19:46:19.4045883Z 2025-05-07T19:46:19.4045887Z 2025-05-07T19:46:19.4045890Z 2025-05-07T19:46:19.4045893Z 2025-05-07T19:46:19.4047263Z 2025-05-07T19:46:19.4447083Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:19.4447516Z 2025-05-07T19:46:19.4447538Z 2025-05-07T19:46:19.4447566Z 2025-05-07T19:46:19.4447572Z 2025-05-07T19:46:19.4447579Z 2025-05-07T19:46:19.4447586Z 2025-05-07T19:46:19.4447595Z 2025-05-07T19:46:19.4447602Z 2025-05-07T19:46:19.4447608Z 2025-05-07T19:46:19.4447615Z 2025-05-07T19:46:19.4447621Z 2025-05-07T19:46:19.4447628Z 2025-05-07T19:46:19.4447634Z 2025-05-07T19:46:19.4998089Z cuda-nvvm-tools-12.8 | 23.5 MB | #####8 | 58%  2025-05-07T19:46:19.4998522Z 2025-05-07T19:46:19.4998526Z 2025-05-07T19:46:19.4998529Z 2025-05-07T19:46:19.4998738Z 2025-05-07T19:46:19.4998742Z 2025-05-07T19:46:19.4998746Z 2025-05-07T19:46:19.4998750Z 2025-05-07T19:46:19.4998753Z 2025-05-07T19:46:19.4998757Z 2025-05-07T19:46:19.4998761Z 2025-05-07T19:46:19.4998764Z 2025-05-07T19:46:19.4998767Z 2025-05-07T19:46:19.4998771Z 2025-05-07T19:46:19.4998785Z 2025-05-07T19:46:19.5046821Z cuda-nvvm-impl-12.8. | 20.8 MB | ######7 | 68%  2025-05-07T19:46:19.5048300Z 2025-05-07T19:46:19.5048321Z 2025-05-07T19:46:19.5048340Z 2025-05-07T19:46:19.5048359Z 2025-05-07T19:46:19.5048378Z 2025-05-07T19:46:19.5048397Z 2025-05-07T19:46:19.5048416Z 2025-05-07T19:46:19.5048435Z 2025-05-07T19:46:19.5048455Z 2025-05-07T19:46:19.5048473Z 2025-05-07T19:46:19.5048491Z 2025-05-07T19:46:19.5048530Z 2025-05-07T19:46:19.5048549Z 2025-05-07T19:46:19.5048568Z 2025-05-07T19:46:19.5048586Z 2025-05-07T19:46:19.5450321Z cuda-nvcc-dev_linux- | 12.7 MB | ####6 | 46%  2025-05-07T19:46:19.5451554Z 2025-05-07T19:46:19.5451566Z 2025-05-07T19:46:19.5451577Z 2025-05-07T19:46:19.5451608Z 2025-05-07T19:46:19.5451618Z 2025-05-07T19:46:19.5451628Z 2025-05-07T19:46:19.5451639Z 2025-05-07T19:46:19.5451649Z 2025-05-07T19:46:19.5451659Z 2025-05-07T19:46:19.5451669Z 2025-05-07T19:46:19.5451712Z 2025-05-07T19:46:19.5451731Z 2025-05-07T19:46:19.5451748Z 2025-05-07T19:46:19.6047398Z cuda-nvvm-tools-12.8 | 23.5 MB | ########5 | 85%  2025-05-07T19:46:19.6047855Z 2025-05-07T19:46:19.6047859Z 2025-05-07T19:46:19.6047863Z 2025-05-07T19:46:19.6047866Z 2025-05-07T19:46:19.6047870Z 2025-05-07T19:46:19.6047873Z 2025-05-07T19:46:19.6047877Z 2025-05-07T19:46:19.6047880Z 2025-05-07T19:46:19.6047884Z 2025-05-07T19:46:19.6047887Z 2025-05-07T19:46:19.6047892Z 2025-05-07T19:46:19.6047895Z 2025-05-07T19:46:19.6047898Z 2025-05-07T19:46:19.6047902Z 2025-05-07T19:46:19.6047905Z 2025-05-07T19:46:19.7608626Z cuda-nvcc-dev_linux- | 12.7 MB | #########9 | 100%  2025-05-07T19:46:19.7609691Z 2025-05-07T19:46:19.7609705Z 2025-05-07T19:46:19.7609717Z 2025-05-07T19:46:19.7609728Z 2025-05-07T19:46:19.7609738Z 2025-05-07T19:46:19.7609749Z 2025-05-07T19:46:19.7609759Z 2025-05-07T19:46:19.7609770Z 2025-05-07T19:46:19.7610176Z 2025-05-07T19:46:19.7610187Z 2025-05-07T19:46:19.7610197Z 2025-05-07T19:46:19.7610207Z 2025-05-07T19:46:19.7610218Z 2025-05-07T19:46:19.7610250Z 2025-05-07T19:46:19.7610261Z 2025-05-07T19:46:19.8463006Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:19.8463384Z 2025-05-07T19:46:19.8463389Z 2025-05-07T19:46:19.8463393Z 2025-05-07T19:46:19.8463411Z 2025-05-07T19:46:19.8463415Z 2025-05-07T19:46:19.8463419Z 2025-05-07T19:46:19.8463422Z 2025-05-07T19:46:19.8463426Z 2025-05-07T19:46:19.8463430Z 2025-05-07T19:46:19.8463433Z 2025-05-07T19:46:19.8463437Z 2025-05-07T19:46:19.8463441Z 2025-05-07T19:46:19.8463459Z 2025-05-07T19:46:19.8463463Z 2025-05-07T19:46:19.8463466Z 2025-05-07T19:46:19.8463470Z 2025-05-07T19:46:19.8463793Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:19.8464144Z 2025-05-07T19:46:19.8464148Z 2025-05-07T19:46:19.8464159Z 2025-05-07T19:46:19.8464162Z 2025-05-07T19:46:19.8464166Z 2025-05-07T19:46:19.8464169Z 2025-05-07T19:46:19.8464172Z 2025-05-07T19:46:19.8464176Z 2025-05-07T19:46:19.8464179Z 2025-05-07T19:46:19.8464183Z 2025-05-07T19:46:19.8464187Z 2025-05-07T19:46:19.8464190Z 2025-05-07T19:46:19.8464194Z 2025-05-07T19:46:19.8464197Z 2025-05-07T19:46:19.8464509Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:19.8464829Z 2025-05-07T19:46:19.8464833Z 2025-05-07T19:46:19.8464836Z 2025-05-07T19:46:19.8464840Z 2025-05-07T19:46:19.8464843Z 2025-05-07T19:46:19.8464847Z 2025-05-07T19:46:19.8464850Z 2025-05-07T19:46:19.8464854Z 2025-05-07T19:46:19.8465052Z 2025-05-07T19:46:19.8465056Z 2025-05-07T19:46:19.8465060Z 2025-05-07T19:46:19.8465063Z 2025-05-07T19:46:19.8465067Z 2025-05-07T19:46:19.8465070Z 2025-05-07T19:46:19.8795119Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:19.8795481Z 2025-05-07T19:46:19.8795486Z 2025-05-07T19:46:19.8795489Z 2025-05-07T19:46:19.8795493Z 2025-05-07T19:46:19.8795496Z 2025-05-07T19:46:19.8795500Z 2025-05-07T19:46:19.8795503Z 2025-05-07T19:46:19.8795506Z 2025-05-07T19:46:19.8795510Z 2025-05-07T19:46:19.8795513Z 2025-05-07T19:46:19.8795530Z 2025-05-07T19:46:19.8795534Z 2025-05-07T19:46:19.8795537Z 2025-05-07T19:46:19.8820029Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:19.8820394Z 2025-05-07T19:46:19.8820398Z 2025-05-07T19:46:19.8820402Z 2025-05-07T19:46:19.8820419Z 2025-05-07T19:46:19.8820423Z 2025-05-07T19:46:19.8820426Z 2025-05-07T19:46:19.8820443Z 2025-05-07T19:46:19.8820447Z 2025-05-07T19:46:19.8820450Z 2025-05-07T19:46:19.8820454Z 2025-05-07T19:46:19.8820457Z 2025-05-07T19:46:19.8820461Z 2025-05-07T19:46:19.8820464Z 2025-05-07T19:46:19.8820467Z 2025-05-07T19:46:19.8820471Z 2025-05-07T19:46:19.8820475Z 2025-05-07T19:46:19.8820478Z 2025-05-07T19:46:19.9334401Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:19.9336000Z 2025-05-07T19:46:19.9336008Z 2025-05-07T19:46:19.9336014Z 2025-05-07T19:46:19.9336020Z 2025-05-07T19:46:19.9336027Z 2025-05-07T19:46:19.9336034Z 2025-05-07T19:46:19.9336040Z 2025-05-07T19:46:19.9336046Z 2025-05-07T19:46:19.9336052Z 2025-05-07T19:46:19.9336059Z 2025-05-07T19:46:19.9336065Z 2025-05-07T19:46:19.9336071Z 2025-05-07T19:46:19.9336097Z 2025-05-07T19:46:19.9336103Z 2025-05-07T19:46:19.9336109Z 2025-05-07T19:46:19.9336114Z 2025-05-07T19:46:19.9336119Z 2025-05-07T19:46:19.9336127Z 2025-05-07T19:46:19.9464455Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:19.9464866Z 2025-05-07T19:46:19.9464883Z 2025-05-07T19:46:19.9464887Z 2025-05-07T19:46:19.9464891Z 2025-05-07T19:46:19.9464895Z 2025-05-07T19:46:19.9464898Z 2025-05-07T19:46:19.9464902Z 2025-05-07T19:46:19.9465083Z 2025-05-07T19:46:19.9465089Z 2025-05-07T19:46:19.9465096Z 2025-05-07T19:46:19.9465101Z 2025-05-07T19:46:19.9465109Z 2025-05-07T19:46:19.9465115Z 2025-05-07T19:46:19.9465120Z 2025-05-07T19:46:19.9465124Z 2025-05-07T19:46:19.9465129Z 2025-05-07T19:46:19.9989630Z cuda-sanitizer-api-1 | 8.8 MB | ##2 | 22%  2025-05-07T19:46:19.9991343Z 2025-05-07T19:46:19.9991365Z 2025-05-07T19:46:19.9991386Z 2025-05-07T19:46:19.9991405Z 2025-05-07T19:46:19.9991423Z 2025-05-07T19:46:19.9991441Z 2025-05-07T19:46:19.9991459Z 2025-05-07T19:46:19.9991478Z 2025-05-07T19:46:19.9991497Z 2025-05-07T19:46:19.9991516Z 2025-05-07T19:46:19.9991565Z 2025-05-07T19:46:19.9991584Z 2025-05-07T19:46:19.9991602Z 2025-05-07T19:46:19.9991621Z 2025-05-07T19:46:19.9991661Z 2025-05-07T19:46:19.9991680Z 2025-05-07T19:46:19.9991698Z 2025-05-07T19:46:19.9993403Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:19.9994962Z 2025-05-07T19:46:19.9994973Z 2025-05-07T19:46:19.9994984Z 2025-05-07T19:46:19.9994994Z 2025-05-07T19:46:19.9995004Z 2025-05-07T19:46:19.9995014Z 2025-05-07T19:46:19.9995042Z 2025-05-07T19:46:19.9995051Z 2025-05-07T19:46:19.9995062Z 2025-05-07T19:46:19.9995072Z 2025-05-07T19:46:19.9995083Z 2025-05-07T19:46:19.9995093Z 2025-05-07T19:46:19.9995103Z 2025-05-07T19:46:19.9995113Z 2025-05-07T19:46:19.9995123Z 2025-05-07T19:46:19.9995133Z 2025-05-07T19:46:19.9995143Z 2025-05-07T19:46:20.0358419Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:20.0358781Z 2025-05-07T19:46:20.0358978Z 2025-05-07T19:46:20.0358983Z 2025-05-07T19:46:20.0358987Z 2025-05-07T19:46:20.0359013Z 2025-05-07T19:46:20.0359017Z 2025-05-07T19:46:20.0359021Z 2025-05-07T19:46:20.0359025Z 2025-05-07T19:46:20.0359029Z 2025-05-07T19:46:20.0359032Z 2025-05-07T19:46:20.0359036Z 2025-05-07T19:46:20.0359039Z 2025-05-07T19:46:20.0359052Z 2025-05-07T19:46:20.0359056Z 2025-05-07T19:46:20.0359059Z 2025-05-07T19:46:20.0359062Z 2025-05-07T19:46:20.0359066Z 2025-05-07T19:46:20.0359069Z 2025-05-07T19:46:20.0359503Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:20.0359931Z 2025-05-07T19:46:20.0359936Z 2025-05-07T19:46:20.0359942Z 2025-05-07T19:46:20.0359947Z 2025-05-07T19:46:20.0359953Z 2025-05-07T19:46:20.0359961Z 2025-05-07T19:46:20.0359967Z 2025-05-07T19:46:20.0359973Z 2025-05-07T19:46:20.0359978Z 2025-05-07T19:46:20.0359997Z 2025-05-07T19:46:20.0360001Z 2025-05-07T19:46:20.0360007Z 2025-05-07T19:46:20.0360013Z 2025-05-07T19:46:20.0360024Z 2025-05-07T19:46:20.0360031Z 2025-05-07T19:46:20.0360039Z 2025-05-07T19:46:20.0360045Z 2025-05-07T19:46:20.0360049Z 2025-05-07T19:46:20.0383463Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:20.0383885Z 2025-05-07T19:46:20.0383897Z 2025-05-07T19:46:20.0383900Z 2025-05-07T19:46:20.0383922Z 2025-05-07T19:46:20.0383925Z 2025-05-07T19:46:20.0383929Z 2025-05-07T19:46:20.0383932Z 2025-05-07T19:46:20.0383935Z 2025-05-07T19:46:20.0383939Z 2025-05-07T19:46:20.0383942Z 2025-05-07T19:46:20.0383945Z 2025-05-07T19:46:20.0383949Z 2025-05-07T19:46:20.0383952Z 2025-05-07T19:46:20.0383956Z 2025-05-07T19:46:20.0383959Z 2025-05-07T19:46:20.0383962Z 2025-05-07T19:46:20.0383966Z 2025-05-07T19:46:20.0383970Z 2025-05-07T19:46:20.0383973Z 2025-05-07T19:46:20.0545255Z ... (more hidden) ... 2025-05-07T19:46:20.0545579Z 2025-05-07T19:46:20.0545584Z 2025-05-07T19:46:20.0545602Z 2025-05-07T19:46:20.0545610Z 2025-05-07T19:46:20.0545617Z 2025-05-07T19:46:20.0545623Z 2025-05-07T19:46:20.0545629Z 2025-05-07T19:46:20.0545635Z 2025-05-07T19:46:20.0545642Z 2025-05-07T19:46:20.0545648Z 2025-05-07T19:46:20.0545670Z 2025-05-07T19:46:20.0545676Z 2025-05-07T19:46:20.0545911Z 2025-05-07T19:46:20.0545919Z 2025-05-07T19:46:20.0545925Z 2025-05-07T19:46:20.0545931Z 2025-05-07T19:46:20.1027526Z cuda-sanitizer-api-1 | 8.8 MB | #####9 | 59%  2025-05-07T19:46:20.1027927Z 2025-05-07T19:46:20.1027931Z 2025-05-07T19:46:20.1027956Z 2025-05-07T19:46:20.1027960Z 2025-05-07T19:46:20.1027964Z 2025-05-07T19:46:20.1028140Z 2025-05-07T19:46:20.1028149Z 2025-05-07T19:46:20.1028157Z 2025-05-07T19:46:20.1028165Z 2025-05-07T19:46:20.1028170Z 2025-05-07T19:46:20.1028174Z 2025-05-07T19:46:20.1028179Z 2025-05-07T19:46:20.1028183Z 2025-05-07T19:46:20.1028217Z 2025-05-07T19:46:20.1028222Z 2025-05-07T19:46:20.1028227Z 2025-05-07T19:46:20.1028259Z 2025-05-07T19:46:20.1028263Z 2025-05-07T19:46:20.1028268Z 2025-05-07T19:46:20.2308800Z ... (more hidden) ... 2025-05-07T19:46:20.2309119Z 2025-05-07T19:46:20.2309124Z 2025-05-07T19:46:20.2309129Z 2025-05-07T19:46:20.2309133Z 2025-05-07T19:46:20.2309157Z 2025-05-07T19:46:20.2309160Z 2025-05-07T19:46:20.2309164Z 2025-05-07T19:46:20.2309170Z 2025-05-07T19:46:20.2309271Z 2025-05-07T19:46:20.2309279Z 2025-05-07T19:46:20.2309284Z 2025-05-07T19:46:20.2309290Z 2025-05-07T19:46:20.2309297Z 2025-05-07T19:46:20.2309303Z 2025-05-07T19:46:20.2309308Z 2025-05-07T19:46:20.2309313Z 2025-05-07T19:46:20.2309863Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:20.2310241Z 2025-05-07T19:46:20.2310259Z 2025-05-07T19:46:20.2310263Z 2025-05-07T19:46:20.2310267Z 2025-05-07T19:46:20.2310272Z 2025-05-07T19:46:20.2310276Z 2025-05-07T19:46:20.2310280Z 2025-05-07T19:46:20.2310486Z 2025-05-07T19:46:20.2310491Z 2025-05-07T19:46:20.2310494Z 2025-05-07T19:46:20.2310498Z 2025-05-07T19:46:20.2310502Z 2025-05-07T19:46:20.2310506Z 2025-05-07T19:46:20.2310510Z 2025-05-07T19:46:20.2310514Z 2025-05-07T19:46:20.2310530Z 2025-05-07T19:46:20.4314970Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:20.4315396Z 2025-05-07T19:46:20.4315401Z 2025-05-07T19:46:20.4315405Z 2025-05-07T19:46:20.4315408Z 2025-05-07T19:46:20.4315412Z 2025-05-07T19:46:20.4315416Z 2025-05-07T19:46:20.4315419Z 2025-05-07T19:46:20.4315436Z 2025-05-07T19:46:20.4315440Z 2025-05-07T19:46:20.4644491Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:20.4644826Z 2025-05-07T19:46:20.4644831Z 2025-05-07T19:46:20.4644835Z 2025-05-07T19:46:20.4644839Z 2025-05-07T19:46:20.4644844Z 2025-05-07T19:46:20.4644848Z 2025-05-07T19:46:20.4644865Z 2025-05-07T19:46:20.4834928Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:20.4835243Z 2025-05-07T19:46:20.7939549Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:20.7940051Z 2025-05-07T19:46:20.7940057Z 2025-05-07T19:46:20.7940061Z 2025-05-07T19:46:20.7940066Z 2025-05-07T19:46:20.7940094Z 2025-05-07T19:46:20.9805341Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:20.9805656Z 2025-05-07T19:46:20.9805661Z 2025-05-07T19:46:20.9805685Z 2025-05-07T19:46:20.9805692Z 2025-05-07T19:46:20.9805696Z 2025-05-07T19:46:20.9805701Z 2025-05-07T19:46:20.9805706Z 2025-05-07T19:46:20.9805711Z 2025-05-07T19:46:20.9805716Z 2025-05-07T19:46:20.9805732Z 2025-05-07T19:46:20.9805736Z 2025-05-07T19:46:20.9805741Z 2025-05-07T19:46:21.3009751Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:21.3010149Z 2025-05-07T19:46:21.3010154Z 2025-05-07T19:46:21.3010159Z 2025-05-07T19:46:21.3010167Z 2025-05-07T19:46:21.3010216Z 2025-05-07T19:46:21.3010222Z 2025-05-07T19:46:21.3010226Z 2025-05-07T19:46:21.3010230Z 2025-05-07T19:46:21.3010234Z 2025-05-07T19:46:21.3010237Z 2025-05-07T19:46:21.3010241Z 2025-05-07T19:46:21.3333849Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:21.3334449Z 2025-05-07T19:46:21.3334454Z 2025-05-07T19:46:21.3334458Z 2025-05-07T19:46:21.3334462Z 2025-05-07T19:46:21.3334466Z 2025-05-07T19:46:21.3334469Z 2025-05-07T19:46:21.3334473Z 2025-05-07T19:46:21.3334476Z 2025-05-07T19:46:21.3334493Z 2025-05-07T19:46:21.3334496Z 2025-05-07T19:46:21.3334500Z 2025-05-07T19:46:21.3334503Z 2025-05-07T19:46:21.3334506Z 2025-05-07T19:46:21.3334510Z 2025-05-07T19:46:21.3334513Z 2025-05-07T19:46:21.5411698Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:21.5412089Z 2025-05-07T19:46:21.5412112Z 2025-05-07T19:46:21.5412117Z 2025-05-07T19:46:21.5412121Z 2025-05-07T19:46:21.5412155Z 2025-05-07T19:46:21.5412159Z 2025-05-07T19:46:21.5412163Z 2025-05-07T19:46:21.5412167Z 2025-05-07T19:46:21.5682694Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:21.5683187Z 2025-05-07T19:46:21.5683214Z 2025-05-07T19:46:21.5683240Z 2025-05-07T19:46:21.5683245Z 2025-05-07T19:46:21.5683249Z 2025-05-07T19:46:21.5683253Z 2025-05-07T19:46:21.5683264Z 2025-05-07T19:46:21.5683269Z 2025-05-07T19:46:21.5683384Z 2025-05-07T19:46:21.5683394Z 2025-05-07T19:46:21.5683399Z 2025-05-07T19:46:21.5683404Z 2025-05-07T19:46:21.5683408Z 2025-05-07T19:46:21.5683413Z 2025-05-07T19:46:21.5683419Z 2025-05-07T19:46:21.5683428Z 2025-05-07T19:46:21.5683433Z 2025-05-07T19:46:21.5998667Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:21.5999034Z 2025-05-07T19:46:21.5999038Z 2025-05-07T19:46:21.5999042Z 2025-05-07T19:46:21.5999045Z 2025-05-07T19:46:21.5999049Z 2025-05-07T19:46:21.5999282Z 2025-05-07T19:46:21.5999287Z 2025-05-07T19:46:21.5999291Z 2025-05-07T19:46:21.5999294Z 2025-05-07T19:46:21.5999297Z 2025-05-07T19:46:21.5999301Z 2025-05-07T19:46:21.5999305Z 2025-05-07T19:46:21.5999323Z 2025-05-07T19:46:21.5999327Z 2025-05-07T19:46:21.6406989Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:21.6452593Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:21.6452913Z 2025-05-07T19:46:21.6452918Z 2025-05-07T19:46:21.6452922Z 2025-05-07T19:46:21.6452926Z 2025-05-07T19:46:21.6452929Z 2025-05-07T19:46:21.6452932Z 2025-05-07T19:46:21.6452936Z 2025-05-07T19:46:21.6452939Z 2025-05-07T19:46:21.6452943Z 2025-05-07T19:46:21.6452946Z 2025-05-07T19:46:21.6452949Z 2025-05-07T19:46:21.6452954Z 2025-05-07T19:46:21.6452957Z 2025-05-07T19:46:21.6452973Z 2025-05-07T19:46:21.6452976Z 2025-05-07T19:46:21.6452980Z 2025-05-07T19:46:21.6452983Z 2025-05-07T19:46:21.6452987Z 2025-05-07T19:46:21.6453003Z 2025-05-07T19:46:21.6453311Z ... (more hidden) ... 2025-05-07T19:46:21.6453605Z 2025-05-07T19:46:21.6453608Z 2025-05-07T19:46:21.6453612Z 2025-05-07T19:46:21.6453615Z 2025-05-07T19:46:21.6453631Z 2025-05-07T19:46:21.6453635Z 2025-05-07T19:46:21.6453644Z 2025-05-07T19:46:21.6453647Z 2025-05-07T19:46:21.6453651Z 2025-05-07T19:46:21.6453654Z 2025-05-07T19:46:21.6453658Z 2025-05-07T19:46:21.6453661Z 2025-05-07T19:46:21.6453665Z 2025-05-07T19:46:21.6453668Z 2025-05-07T19:46:21.6453671Z 2025-05-07T19:46:21.6453675Z 2025-05-07T19:46:21.6453678Z 2025-05-07T19:46:21.6453682Z 2025-05-07T19:46:21.6453685Z 2025-05-07T19:46:21.7085338Z ... (more hidden) ... 2025-05-07T19:46:21.7085678Z 2025-05-07T19:46:21.7085682Z 2025-05-07T19:46:21.7085686Z 2025-05-07T19:46:21.7085690Z 2025-05-07T19:46:21.7085694Z 2025-05-07T19:46:21.7085698Z 2025-05-07T19:46:21.7085701Z 2025-05-07T19:46:21.7085726Z 2025-05-07T19:46:21.7085730Z 2025-05-07T19:46:21.7085734Z 2025-05-07T19:46:21.7085737Z 2025-05-07T19:46:21.7085741Z 2025-05-07T19:46:21.7085744Z 2025-05-07T19:46:21.7164696Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:21.7165303Z 2025-05-07T19:46:21.7165307Z 2025-05-07T19:46:21.7165311Z 2025-05-07T19:46:21.7165315Z 2025-05-07T19:46:21.7165319Z 2025-05-07T19:46:21.7165322Z 2025-05-07T19:46:21.7165325Z 2025-05-07T19:46:21.7165329Z 2025-05-07T19:46:21.7165332Z 2025-05-07T19:46:21.7165336Z 2025-05-07T19:46:21.7165339Z 2025-05-07T19:46:21.7165342Z 2025-05-07T19:46:21.7165346Z 2025-05-07T19:46:21.7165349Z 2025-05-07T19:46:21.7165353Z 2025-05-07T19:46:21.7165369Z 2025-05-07T19:46:21.7165372Z 2025-05-07T19:46:21.7165376Z 2025-05-07T19:46:22.0605760Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:22.0606406Z 2025-05-07T19:46:22.0606449Z 2025-05-07T19:46:22.0606453Z 2025-05-07T19:46:22.0606458Z 2025-05-07T19:46:22.0606461Z 2025-05-07T19:46:22.0606465Z 2025-05-07T19:46:22.0606470Z 2025-05-07T19:46:22.0606475Z 2025-05-07T19:46:22.0606479Z 2025-05-07T19:46:22.0606482Z 2025-05-07T19:46:22.0606487Z 2025-05-07T19:46:22.0606490Z 2025-05-07T19:46:22.0606515Z 2025-05-07T19:46:22.0606519Z 2025-05-07T19:46:22.0606549Z 2025-05-07T19:46:22.0606567Z 2025-05-07T19:46:25.2635408Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:26.0518185Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:26.0518902Z 2025-05-07T19:46:26.0524704Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:26.0525123Z 2025-05-07T19:46:26.0525128Z 2025-05-07T19:46:26.0525134Z 2025-05-07T19:46:26.0525138Z 2025-05-07T19:46:26.0525143Z 2025-05-07T19:46:26.0525147Z 2025-05-07T19:46:26.0525152Z 2025-05-07T19:46:26.0525157Z 2025-05-07T19:46:26.0525496Z 2025-05-07T19:46:26.0525502Z 2025-05-07T19:46:26.0525511Z 2025-05-07T19:46:26.0525515Z 2025-05-07T19:46:26.0525520Z 2025-05-07T19:46:26.0525525Z 2025-05-07T19:46:26.0525530Z 2025-05-07T19:46:26.0525535Z 2025-05-07T19:46:26.0525541Z 2025-05-07T19:46:26.0525544Z 2025-05-07T19:46:26.0525549Z 2025-05-07T19:46:26.0525732Z 2025-05-07T19:46:26.0526122Z  2025-05-07T19:46:26.0526524Z 2025-05-07T19:46:26.0526752Z 2025-05-07T19:46:26.0526936Z  2025-05-07T19:46:26.0527201Z 2025-05-07T19:46:26.0527205Z 2025-05-07T19:46:26.0527391Z  2025-05-07T19:46:26.0527622Z 2025-05-07T19:46:26.0527626Z 2025-05-07T19:46:26.0527630Z 2025-05-07T19:46:26.0527877Z  2025-05-07T19:46:26.0528120Z 2025-05-07T19:46:26.0528129Z 2025-05-07T19:46:26.0528132Z 2025-05-07T19:46:26.0528136Z 2025-05-07T19:46:26.0528364Z  2025-05-07T19:46:26.0528602Z 2025-05-07T19:46:26.0528606Z 2025-05-07T19:46:26.0528609Z 2025-05-07T19:46:26.0528613Z 2025-05-07T19:46:26.0528616Z 2025-05-07T19:46:26.0528810Z  2025-05-07T19:46:26.0529079Z 2025-05-07T19:46:26.0529083Z 2025-05-07T19:46:26.0529086Z 2025-05-07T19:46:26.0529090Z 2025-05-07T19:46:26.0529093Z 2025-05-07T19:46:26.0529097Z 2025-05-07T19:46:26.0529297Z  2025-05-07T19:46:26.0529565Z 2025-05-07T19:46:26.0529568Z 2025-05-07T19:46:26.0529572Z 2025-05-07T19:46:26.0529575Z 2025-05-07T19:46:26.0529579Z 2025-05-07T19:46:26.0529582Z 2025-05-07T19:46:26.0529585Z 2025-05-07T19:46:26.0529783Z  2025-05-07T19:46:26.0530026Z 2025-05-07T19:46:26.0530030Z 2025-05-07T19:46:26.0530033Z 2025-05-07T19:46:26.0530064Z 2025-05-07T19:46:26.0530068Z 2025-05-07T19:46:26.0530071Z 2025-05-07T19:46:26.0530075Z 2025-05-07T19:46:26.0530078Z 2025-05-07T19:46:26.0530275Z  2025-05-07T19:46:26.0530649Z 2025-05-07T19:46:26.0530652Z 2025-05-07T19:46:26.0530656Z 2025-05-07T19:46:26.0530659Z 2025-05-07T19:46:26.0530663Z 2025-05-07T19:46:26.0530699Z 2025-05-07T19:46:26.0530703Z 2025-05-07T19:46:26.0530706Z 2025-05-07T19:46:26.0530710Z 2025-05-07T19:46:26.0530971Z  2025-05-07T19:46:26.0531253Z 2025-05-07T19:46:26.0531256Z 2025-05-07T19:46:26.0531260Z 2025-05-07T19:46:26.0531263Z 2025-05-07T19:46:26.0531267Z 2025-05-07T19:46:26.0531270Z 2025-05-07T19:46:26.0531274Z 2025-05-07T19:46:26.0531277Z 2025-05-07T19:46:26.0531280Z 2025-05-07T19:46:26.0531284Z 2025-05-07T19:46:26.0531494Z  2025-05-07T19:46:26.0531780Z 2025-05-07T19:46:26.0531784Z 2025-05-07T19:46:26.0531788Z 2025-05-07T19:46:26.0531791Z 2025-05-07T19:46:26.0531794Z 2025-05-07T19:46:26.0531798Z 2025-05-07T19:46:26.0531801Z 2025-05-07T19:46:26.0531809Z 2025-05-07T19:46:26.0531812Z 2025-05-07T19:46:26.0531816Z 2025-05-07T19:46:26.0531819Z 2025-05-07T19:46:26.0532064Z  2025-05-07T19:46:26.0532313Z 2025-05-07T19:46:26.0532316Z 2025-05-07T19:46:26.0532320Z 2025-05-07T19:46:26.0532323Z 2025-05-07T19:46:26.0532327Z 2025-05-07T19:46:26.0532330Z 2025-05-07T19:46:26.0532334Z 2025-05-07T19:46:26.0532337Z 2025-05-07T19:46:26.0532340Z 2025-05-07T19:46:26.0532344Z 2025-05-07T19:46:26.0532347Z 2025-05-07T19:46:26.0532351Z 2025-05-07T19:46:26.0532658Z  2025-05-07T19:46:26.0532915Z 2025-05-07T19:46:26.0532919Z 2025-05-07T19:46:26.0532922Z 2025-05-07T19:46:26.0532925Z 2025-05-07T19:46:26.0532929Z 2025-05-07T19:46:26.0532932Z 2025-05-07T19:46:26.0532936Z 2025-05-07T19:46:26.0532939Z 2025-05-07T19:46:26.0532943Z 2025-05-07T19:46:26.0532946Z 2025-05-07T19:46:26.0532953Z 2025-05-07T19:46:26.0532956Z 2025-05-07T19:46:26.0532960Z 2025-05-07T19:46:26.0533203Z  2025-05-07T19:46:26.0533460Z 2025-05-07T19:46:26.0533464Z 2025-05-07T19:46:26.0533467Z 2025-05-07T19:46:26.0533471Z 2025-05-07T19:46:26.0533474Z 2025-05-07T19:46:26.0533477Z 2025-05-07T19:46:26.0533481Z 2025-05-07T19:46:26.0533484Z 2025-05-07T19:46:26.0533488Z 2025-05-07T19:46:26.0533491Z 2025-05-07T19:46:26.0533494Z 2025-05-07T19:46:26.0533524Z 2025-05-07T19:46:26.0533528Z 2025-05-07T19:46:26.0533531Z 2025-05-07T19:46:26.0533765Z  2025-05-07T19:46:26.0534023Z 2025-05-07T19:46:26.0534027Z 2025-05-07T19:46:26.0534030Z 2025-05-07T19:46:26.0534033Z 2025-05-07T19:46:26.0534037Z 2025-05-07T19:46:26.0534040Z 2025-05-07T19:46:26.0534071Z 2025-05-07T19:46:26.0534075Z 2025-05-07T19:46:26.0534078Z 2025-05-07T19:46:26.0534085Z 2025-05-07T19:46:26.0534088Z 2025-05-07T19:46:26.0534092Z 2025-05-07T19:46:26.0534095Z 2025-05-07T19:46:26.0534098Z 2025-05-07T19:46:26.0534102Z 2025-05-07T19:46:26.0534327Z  2025-05-07T19:46:26.0534589Z 2025-05-07T19:46:26.0534619Z 2025-05-07T19:46:26.0534623Z 2025-05-07T19:46:26.0534626Z 2025-05-07T19:46:26.0534629Z 2025-05-07T19:46:26.0534633Z 2025-05-07T19:46:26.0534636Z 2025-05-07T19:46:26.0534640Z 2025-05-07T19:46:26.0534643Z 2025-05-07T19:46:26.0534647Z 2025-05-07T19:46:26.0534650Z 2025-05-07T19:46:26.0534653Z 2025-05-07T19:46:26.0534657Z 2025-05-07T19:46:26.0534664Z 2025-05-07T19:46:26.0534668Z 2025-05-07T19:46:26.0534671Z 2025-05-07T19:46:26.0534903Z  2025-05-07T19:46:26.0535191Z 2025-05-07T19:46:26.0535195Z 2025-05-07T19:46:26.0535198Z 2025-05-07T19:46:26.0535259Z 2025-05-07T19:46:26.0535263Z 2025-05-07T19:46:26.0535266Z 2025-05-07T19:46:26.0535270Z 2025-05-07T19:46:26.0535273Z 2025-05-07T19:46:26.0535276Z 2025-05-07T19:46:26.0535280Z 2025-05-07T19:46:26.0535283Z 2025-05-07T19:46:26.0535286Z 2025-05-07T19:46:26.0535290Z 2025-05-07T19:46:26.0535293Z 2025-05-07T19:46:26.0535296Z 2025-05-07T19:46:26.0535300Z 2025-05-07T19:46:26.0535303Z 2025-05-07T19:46:26.0535565Z  2025-05-07T19:46:26.0535829Z 2025-05-07T19:46:26.0535833Z 2025-05-07T19:46:26.0535836Z 2025-05-07T19:46:26.0535839Z 2025-05-07T19:46:26.0535843Z 2025-05-07T19:46:26.0535846Z 2025-05-07T19:46:26.0535853Z 2025-05-07T19:46:26.0535857Z 2025-05-07T19:46:26.0535861Z 2025-05-07T19:46:26.0535864Z 2025-05-07T19:46:26.0535867Z 2025-05-07T19:46:26.0535898Z 2025-05-07T19:46:26.0535901Z 2025-05-07T19:46:26.0535905Z 2025-05-07T19:46:26.0535908Z 2025-05-07T19:46:26.0535912Z 2025-05-07T19:46:26.0535919Z 2025-05-07T19:46:26.0535922Z 2025-05-07T19:46:26.0536232Z  2025-05-07T19:46:26.0536503Z 2025-05-07T19:46:26.0536506Z 2025-05-07T19:46:26.0536617Z  2025-05-07T19:46:26.0536764Z 2025-05-07T19:46:26.0536768Z 2025-05-07T19:46:26.0536880Z  2025-05-07T19:46:26.0537003Z 2025-05-07T19:46:26.0537006Z 2025-05-07T19:46:26.0537009Z 2025-05-07T19:46:26.0537151Z  2025-05-07T19:46:26.0537277Z 2025-05-07T19:46:26.0537280Z 2025-05-07T19:46:26.0537284Z 2025-05-07T19:46:26.0537287Z 2025-05-07T19:46:26.0537405Z  2025-05-07T19:46:26.0537565Z 2025-05-07T19:46:26.0537625Z 2025-05-07T19:46:26.0537630Z 2025-05-07T19:46:26.0537633Z 2025-05-07T19:46:26.0537636Z 2025-05-07T19:46:26.0537762Z  2025-05-07T19:46:26.0537903Z 2025-05-07T19:46:26.0537906Z 2025-05-07T19:46:26.0537910Z 2025-05-07T19:46:26.0537914Z 2025-05-07T19:46:26.0537917Z 2025-05-07T19:46:26.0537924Z 2025-05-07T19:46:26.0538078Z  2025-05-07T19:46:26.0538223Z 2025-05-07T19:46:26.0538226Z 2025-05-07T19:46:26.0538231Z 2025-05-07T19:46:26.0538234Z 2025-05-07T19:46:26.0538238Z 2025-05-07T19:46:26.0538241Z 2025-05-07T19:46:26.0538244Z 2025-05-07T19:46:26.0538402Z  2025-05-07T19:46:26.0538558Z 2025-05-07T19:46:26.0538561Z 2025-05-07T19:46:26.0538565Z 2025-05-07T19:46:26.0538568Z 2025-05-07T19:46:26.0538572Z 2025-05-07T19:46:26.0538575Z 2025-05-07T19:46:26.0538579Z 2025-05-07T19:46:26.0538583Z 2025-05-07T19:46:26.0538716Z  2025-05-07T19:46:26.0538911Z 2025-05-07T19:46:26.0538914Z 2025-05-07T19:46:26.0538918Z 2025-05-07T19:46:26.0538925Z 2025-05-07T19:46:26.0538929Z 2025-05-07T19:46:26.0538932Z 2025-05-07T19:46:26.0538936Z 2025-05-07T19:46:26.0538939Z 2025-05-07T19:46:26.0538943Z 2025-05-07T19:46:26.0539078Z  2025-05-07T19:46:26.0539284Z 2025-05-07T19:46:26.0539288Z 2025-05-07T19:46:26.0539295Z 2025-05-07T19:46:26.0539298Z 2025-05-07T19:46:26.0539301Z 2025-05-07T19:46:26.0539305Z 2025-05-07T19:46:26.0539308Z 2025-05-07T19:46:26.0539312Z 2025-05-07T19:46:26.0539315Z 2025-05-07T19:46:26.0539319Z 2025-05-07T19:46:26.0539463Z  2025-05-07T19:46:26.0539673Z 2025-05-07T19:46:26.0539676Z 2025-05-07T19:46:26.0539680Z 2025-05-07T19:46:26.0539683Z 2025-05-07T19:46:26.0539687Z 2025-05-07T19:46:26.0539690Z 2025-05-07T19:46:26.0539693Z 2025-05-07T19:46:26.0539697Z 2025-05-07T19:46:26.0539700Z 2025-05-07T19:46:26.0539704Z 2025-05-07T19:46:26.0539707Z 2025-05-07T19:46:26.0539855Z  2025-05-07T19:46:26.0540081Z 2025-05-07T19:46:26.0540088Z 2025-05-07T19:46:26.0540092Z 2025-05-07T19:46:26.0540096Z 2025-05-07T19:46:26.0540099Z 2025-05-07T19:46:26.0540102Z 2025-05-07T19:46:26.0540106Z 2025-05-07T19:46:26.0540109Z 2025-05-07T19:46:26.0540112Z 2025-05-07T19:46:26.0540116Z 2025-05-07T19:46:26.0540119Z 2025-05-07T19:46:26.0540180Z 2025-05-07T19:46:26.0540329Z  2025-05-07T19:46:26.0540560Z 2025-05-07T19:46:26.0540563Z 2025-05-07T19:46:26.0540566Z 2025-05-07T19:46:26.0540570Z 2025-05-07T19:46:26.0540573Z 2025-05-07T19:46:26.0540577Z 2025-05-07T19:46:26.0540580Z 2025-05-07T19:46:26.0540583Z 2025-05-07T19:46:26.0540587Z 2025-05-07T19:46:26.0540591Z 2025-05-07T19:46:26.0540594Z 2025-05-07T19:46:26.0540597Z 2025-05-07T19:46:26.0540601Z 2025-05-07T19:46:26.0540782Z  2025-05-07T19:46:26.0540992Z 2025-05-07T19:46:26.0540996Z 2025-05-07T19:46:26.0540999Z 2025-05-07T19:46:26.0541003Z 2025-05-07T19:46:26.0541006Z 2025-05-07T19:46:26.0541015Z 2025-05-07T19:46:26.0541019Z 2025-05-07T19:46:26.0541022Z 2025-05-07T19:46:26.0541026Z 2025-05-07T19:46:26.0541029Z 2025-05-07T19:46:26.0541033Z 2025-05-07T19:46:26.0541036Z 2025-05-07T19:46:26.0541039Z 2025-05-07T19:46:26.0541043Z 2025-05-07T19:46:26.0541234Z  2025-05-07T19:46:26.0541453Z 2025-05-07T19:46:26.0541456Z 2025-05-07T19:46:26.0541460Z 2025-05-07T19:46:26.0541464Z 2025-05-07T19:46:26.0541467Z 2025-05-07T19:46:26.0541471Z 2025-05-07T19:46:26.0541474Z 2025-05-07T19:46:26.0541478Z 2025-05-07T19:46:26.0541481Z 2025-05-07T19:46:26.0541485Z 2025-05-07T19:46:26.0541488Z 2025-05-07T19:46:26.0541492Z 2025-05-07T19:46:26.0541495Z 2025-05-07T19:46:26.0541528Z 2025-05-07T19:46:26.0541532Z 2025-05-07T19:46:26.0541744Z  2025-05-07T19:46:26.0541995Z 2025-05-07T19:46:26.0541998Z 2025-05-07T19:46:26.0542002Z 2025-05-07T19:46:26.0542005Z 2025-05-07T19:46:26.0542008Z 2025-05-07T19:46:26.0542012Z 2025-05-07T19:46:26.0542068Z 2025-05-07T19:46:26.0542072Z 2025-05-07T19:46:26.0542076Z 2025-05-07T19:46:26.0542079Z 2025-05-07T19:46:26.0542082Z 2025-05-07T19:46:26.0542085Z 2025-05-07T19:46:26.0542089Z 2025-05-07T19:46:26.0542092Z 2025-05-07T19:46:26.0542096Z 2025-05-07T19:46:26.0542099Z 2025-05-07T19:46:26.0542301Z  2025-05-07T19:46:26.0542530Z 2025-05-07T19:46:26.0542533Z 2025-05-07T19:46:26.0542537Z 2025-05-07T19:46:26.0542541Z 2025-05-07T19:46:26.0542544Z 2025-05-07T19:46:26.0542547Z 2025-05-07T19:46:26.0542551Z 2025-05-07T19:46:26.0542554Z 2025-05-07T19:46:26.0542558Z 2025-05-07T19:46:26.0542561Z 2025-05-07T19:46:26.0542565Z 2025-05-07T19:46:26.0542568Z 2025-05-07T19:46:26.0542571Z 2025-05-07T19:46:26.0542575Z 2025-05-07T19:46:26.0542608Z 2025-05-07T19:46:26.0542611Z 2025-05-07T19:46:26.0542615Z 2025-05-07T19:46:26.0542787Z  2025-05-07T19:46:26.0543018Z 2025-05-07T19:46:26.0543026Z 2025-05-07T19:46:26.0543030Z 2025-05-07T19:46:26.0543034Z 2025-05-07T19:46:26.0543037Z 2025-05-07T19:46:26.0543041Z 2025-05-07T19:46:26.0543044Z 2025-05-07T19:46:26.0543048Z 2025-05-07T19:46:26.0543081Z 2025-05-07T19:46:26.0543085Z 2025-05-07T19:46:26.0543088Z 2025-05-07T19:46:26.0543091Z 2025-05-07T19:46:26.0543098Z 2025-05-07T19:46:26.0543101Z 2025-05-07T19:46:26.0543105Z 2025-05-07T19:46:26.0543108Z 2025-05-07T19:46:26.0543111Z 2025-05-07T19:46:26.0543115Z 2025-05-07T19:46:26.0543295Z  2025-05-07T19:46:26.0543564Z 2025-05-07T19:46:26.0543567Z 2025-05-07T19:46:26.0543683Z  2025-05-07T19:46:26.0543801Z 2025-05-07T19:46:26.0543804Z 2025-05-07T19:46:26.0543913Z  2025-05-07T19:46:26.0544063Z 2025-05-07T19:46:26.0544067Z 2025-05-07T19:46:26.0544070Z 2025-05-07T19:46:26.0544184Z  2025-05-07T19:46:26.0544306Z 2025-05-07T19:46:26.0544310Z 2025-05-07T19:46:26.0544313Z 2025-05-07T19:46:26.0544345Z 2025-05-07T19:46:26.0544464Z  2025-05-07T19:46:26.0544595Z 2025-05-07T19:46:26.0544599Z 2025-05-07T19:46:26.0544602Z 2025-05-07T19:46:26.0544605Z 2025-05-07T19:46:26.0544609Z 2025-05-07T19:46:26.0544760Z  2025-05-07T19:46:26.0544901Z 2025-05-07T19:46:26.0544904Z 2025-05-07T19:46:26.0544908Z 2025-05-07T19:46:26.0544982Z 2025-05-07T19:46:26.0544985Z 2025-05-07T19:46:26.0544989Z 2025-05-07T19:46:26.0545112Z  2025-05-07T19:46:26.0545279Z 2025-05-07T19:46:26.0545282Z 2025-05-07T19:46:26.0545286Z 2025-05-07T19:46:26.0545289Z 2025-05-07T19:46:26.0545293Z 2025-05-07T19:46:26.0545296Z 2025-05-07T19:46:26.0545299Z 2025-05-07T19:46:26.0545423Z  2025-05-07T19:46:26.0545575Z 2025-05-07T19:46:26.0545579Z 2025-05-07T19:46:26.0545610Z 2025-05-07T19:46:26.0545613Z 2025-05-07T19:46:26.0545617Z 2025-05-07T19:46:26.0545620Z 2025-05-07T19:46:26.0545624Z 2025-05-07T19:46:26.0545627Z 2025-05-07T19:46:26.0545760Z  2025-05-07T19:46:26.0545928Z 2025-05-07T19:46:26.0545932Z 2025-05-07T19:46:26.0545935Z 2025-05-07T19:46:26.0545939Z 2025-05-07T19:46:26.0545943Z 2025-05-07T19:46:26.0545975Z 2025-05-07T19:46:26.0545978Z 2025-05-07T19:46:26.0545981Z 2025-05-07T19:46:26.0545985Z 2025-05-07T19:46:26.0546118Z  2025-05-07T19:46:26.0546296Z 2025-05-07T19:46:26.0546300Z 2025-05-07T19:46:26.0546303Z 2025-05-07T19:46:26.0546307Z 2025-05-07T19:46:26.0546311Z 2025-05-07T19:46:26.0546315Z 2025-05-07T19:46:26.0546318Z 2025-05-07T19:46:26.0546355Z 2025-05-07T19:46:26.0546358Z 2025-05-07T19:46:26.0546362Z 2025-05-07T19:46:26.0546504Z  2025-05-07T19:46:26.0546686Z 2025-05-07T19:46:26.0546689Z 2025-05-07T19:46:26.0546693Z 2025-05-07T19:46:26.0546696Z 2025-05-07T19:46:26.0546700Z 2025-05-07T19:46:26.0546703Z 2025-05-07T19:46:26.0546707Z 2025-05-07T19:46:26.0546738Z 2025-05-07T19:46:26.0546741Z 2025-05-07T19:46:26.0546745Z 2025-05-07T19:46:26.0546748Z 2025-05-07T19:46:26.0546954Z  2025-05-07T19:46:26.0547148Z 2025-05-07T19:46:26.0547152Z 2025-05-07T19:46:26.0547155Z 2025-05-07T19:46:26.0547159Z 2025-05-07T19:46:26.0547162Z 2025-05-07T19:46:26.0547166Z 2025-05-07T19:46:26.0547170Z 2025-05-07T19:46:26.0547200Z 2025-05-07T19:46:26.0547203Z 2025-05-07T19:46:26.0547210Z 2025-05-07T19:46:26.0547213Z 2025-05-07T19:46:26.0547217Z 2025-05-07T19:46:26.0547362Z  2025-05-07T19:46:26.0547561Z 2025-05-07T19:46:26.0547565Z 2025-05-07T19:46:26.0547569Z 2025-05-07T19:46:26.0547573Z 2025-05-07T19:46:26.0547576Z 2025-05-07T19:46:26.0547580Z 2025-05-07T19:46:26.0547612Z 2025-05-07T19:46:26.0547615Z 2025-05-07T19:46:26.0547618Z 2025-05-07T19:46:26.0547622Z 2025-05-07T19:46:26.0547625Z 2025-05-07T19:46:26.0547628Z 2025-05-07T19:46:26.0547632Z 2025-05-07T19:46:26.0547780Z  2025-05-07T19:46:26.0547988Z 2025-05-07T19:46:26.0547992Z 2025-05-07T19:46:26.0547995Z 2025-05-07T19:46:26.0548003Z 2025-05-07T19:46:26.0548034Z 2025-05-07T19:46:26.0548037Z 2025-05-07T19:46:26.0548041Z 2025-05-07T19:46:26.0548044Z 2025-05-07T19:46:26.0548048Z 2025-05-07T19:46:26.0548051Z 2025-05-07T19:46:26.0548054Z 2025-05-07T19:46:26.0548058Z 2025-05-07T19:46:26.0548062Z 2025-05-07T19:46:26.0548065Z 2025-05-07T19:46:26.0548226Z  2025-05-07T19:46:26.0548469Z 2025-05-07T19:46:26.0548473Z 2025-05-07T19:46:26.0548476Z 2025-05-07T19:46:26.0548480Z 2025-05-07T19:46:26.0548483Z 2025-05-07T19:46:26.0548487Z 2025-05-07T19:46:26.0548490Z 2025-05-07T19:46:26.0548494Z 2025-05-07T19:46:26.0548498Z 2025-05-07T19:46:26.0548501Z 2025-05-07T19:46:26.0548505Z 2025-05-07T19:46:26.0548508Z 2025-05-07T19:46:26.0548512Z 2025-05-07T19:46:26.0548515Z 2025-05-07T19:46:26.0548518Z 2025-05-07T19:46:26.0548680Z  2025-05-07T19:46:26.0548926Z 2025-05-07T19:46:26.0548930Z 2025-05-07T19:46:26.0548933Z 2025-05-07T19:46:26.0548940Z 2025-05-07T19:46:26.0548944Z 2025-05-07T19:46:26.0548948Z 2025-05-07T19:46:26.0548951Z 2025-05-07T19:46:26.0548954Z 2025-05-07T19:46:26.0548958Z 2025-05-07T19:46:26.0548961Z 2025-05-07T19:46:26.0548965Z 2025-05-07T19:46:26.0548968Z 2025-05-07T19:46:26.0548972Z 2025-05-07T19:46:26.0548975Z 2025-05-07T19:46:26.0549029Z 2025-05-07T19:46:26.0549032Z 2025-05-07T19:46:26.0549283Z  2025-05-07T19:46:26.0549506Z 2025-05-07T19:46:26.0549535Z 2025-05-07T19:46:26.0549538Z 2025-05-07T19:46:26.0549542Z 2025-05-07T19:46:26.0549545Z 2025-05-07T19:46:26.0549549Z 2025-05-07T19:46:26.0549552Z 2025-05-07T19:46:26.0549555Z 2025-05-07T19:46:26.0549559Z 2025-05-07T19:46:26.0549562Z 2025-05-07T19:46:26.0549566Z 2025-05-07T19:46:26.0549569Z 2025-05-07T19:46:26.0549573Z 2025-05-07T19:46:26.0549576Z 2025-05-07T19:46:26.0549579Z 2025-05-07T19:46:26.0549583Z 2025-05-07T19:46:26.0549586Z 2025-05-07T19:46:26.0549761Z  2025-05-07T19:46:26.0550018Z 2025-05-07T19:46:26.0550022Z 2025-05-07T19:46:26.0550025Z 2025-05-07T19:46:26.0550028Z 2025-05-07T19:46:26.0550032Z 2025-05-07T19:46:26.0550035Z 2025-05-07T19:46:26.0550039Z 2025-05-07T19:46:26.0550042Z 2025-05-07T19:46:26.0550046Z 2025-05-07T19:46:26.0550049Z 2025-05-07T19:46:26.0550057Z 2025-05-07T19:46:26.0550060Z 2025-05-07T19:46:26.0550064Z 2025-05-07T19:46:26.0550067Z 2025-05-07T19:46:26.0550070Z 2025-05-07T19:46:26.0550074Z 2025-05-07T19:46:26.0550077Z 2025-05-07T19:46:26.0550081Z 2025-05-07T19:46:26.0550285Z  2025-05-07T19:46:26.0550516Z 2025-05-07T19:46:26.0550519Z 2025-05-07T19:46:26.0550628Z  2025-05-07T19:46:26.0550772Z 2025-05-07T19:46:26.0550776Z 2025-05-07T19:46:26.0550886Z  2025-05-07T19:46:26.0551004Z 2025-05-07T19:46:26.0551007Z 2025-05-07T19:46:26.0551011Z 2025-05-07T19:46:26.0551151Z  2025-05-07T19:46:26.0551273Z 2025-05-07T19:46:26.0551277Z 2025-05-07T19:46:26.0551333Z 2025-05-07T19:46:26.0551337Z 2025-05-07T19:46:26.0551455Z  2025-05-07T19:46:26.0551612Z 2025-05-07T19:46:26.0551616Z 2025-05-07T19:46:26.0551619Z 2025-05-07T19:46:26.0551623Z 2025-05-07T19:46:26.0551626Z 2025-05-07T19:46:26.0551755Z  2025-05-07T19:46:26.0551892Z 2025-05-07T19:46:26.0551899Z 2025-05-07T19:46:26.0551902Z 2025-05-07T19:46:26.0551933Z 2025-05-07T19:46:26.0551937Z 2025-05-07T19:46:26.0551940Z 2025-05-07T19:46:26.0552066Z  2025-05-07T19:46:26.0552208Z 2025-05-07T19:46:26.0552211Z 2025-05-07T19:46:26.0552215Z 2025-05-07T19:46:26.0552218Z 2025-05-07T19:46:26.0552222Z 2025-05-07T19:46:26.0552225Z 2025-05-07T19:46:26.0552228Z 2025-05-07T19:46:26.0552383Z  2025-05-07T19:46:26.0552534Z 2025-05-07T19:46:26.0552538Z 2025-05-07T19:46:26.0552541Z 2025-05-07T19:46:26.0552545Z 2025-05-07T19:46:26.0552548Z 2025-05-07T19:46:26.0552552Z 2025-05-07T19:46:26.0552555Z 2025-05-07T19:46:26.0552558Z 2025-05-07T19:46:26.0552723Z  2025-05-07T19:46:26.0552888Z 2025-05-07T19:46:26.0552891Z 2025-05-07T19:46:26.0552894Z 2025-05-07T19:46:26.0552898Z 2025-05-07T19:46:26.0552901Z 2025-05-07T19:46:26.0552904Z 2025-05-07T19:46:26.0552908Z 2025-05-07T19:46:26.0552911Z 2025-05-07T19:46:26.0552915Z 2025-05-07T19:46:26.0553055Z  2025-05-07T19:46:26.0553254Z 2025-05-07T19:46:26.0553257Z 2025-05-07T19:46:26.0553261Z 2025-05-07T19:46:26.0553264Z 2025-05-07T19:46:26.0553267Z 2025-05-07T19:46:26.0553271Z 2025-05-07T19:46:26.0553274Z 2025-05-07T19:46:26.0553277Z 2025-05-07T19:46:26.0553281Z 2025-05-07T19:46:26.0553284Z 2025-05-07T19:46:26.0553424Z  2025-05-07T19:46:26.0553640Z 2025-05-07T19:46:26.0553644Z 2025-05-07T19:46:26.0553648Z 2025-05-07T19:46:26.0553651Z 2025-05-07T19:46:26.0553655Z 2025-05-07T19:46:26.0553658Z 2025-05-07T19:46:26.0553662Z 2025-05-07T19:46:26.0553665Z 2025-05-07T19:46:26.0553668Z 2025-05-07T19:46:26.0553675Z 2025-05-07T19:46:26.0553679Z 2025-05-07T19:46:26.0553825Z  2025-05-07T19:46:26.0554057Z 2025-05-07T19:46:26.0554061Z 2025-05-07T19:46:26.0554065Z 2025-05-07T19:46:26.0554068Z 2025-05-07T19:46:26.0554072Z 2025-05-07T19:46:26.0554075Z 2025-05-07T19:46:26.0554079Z 2025-05-07T19:46:26.0554908Z 2025-05-07T19:46:26.0554913Z 2025-05-07T19:46:26.0554916Z 2025-05-07T19:46:26.0554919Z 2025-05-07T19:46:26.0554923Z 2025-05-07T19:46:26.0555128Z  2025-05-07T19:46:26.0555331Z 2025-05-07T19:46:26.0555335Z 2025-05-07T19:46:26.0555338Z 2025-05-07T19:46:26.0555342Z 2025-05-07T19:46:26.0555345Z 2025-05-07T19:46:26.0555349Z 2025-05-07T19:46:26.0555352Z 2025-05-07T19:46:26.0555356Z 2025-05-07T19:46:26.0555359Z 2025-05-07T19:46:26.0555363Z 2025-05-07T19:46:26.0555366Z 2025-05-07T19:46:26.0555371Z 2025-05-07T19:46:26.0555374Z 2025-05-07T19:46:26.0555556Z  2025-05-07T19:46:26.0555765Z 2025-05-07T19:46:26.0555772Z 2025-05-07T19:46:26.0555776Z 2025-05-07T19:46:26.0555779Z 2025-05-07T19:46:26.0555783Z 2025-05-07T19:46:26.0555786Z 2025-05-07T19:46:26.0555790Z 2025-05-07T19:46:26.0555793Z 2025-05-07T19:46:26.0555797Z 2025-05-07T19:46:26.0555800Z 2025-05-07T19:46:26.0555803Z 2025-05-07T19:46:26.0555810Z 2025-05-07T19:46:26.0555814Z 2025-05-07T19:46:26.0555817Z 2025-05-07T19:46:26.0556004Z  2025-05-07T19:46:26.0556219Z 2025-05-07T19:46:26.0556223Z 2025-05-07T19:46:26.0556227Z 2025-05-07T19:46:26.0556230Z 2025-05-07T19:46:26.0556233Z 2025-05-07T19:46:26.0556237Z 2025-05-07T19:46:26.0556241Z 2025-05-07T19:46:26.0556244Z 2025-05-07T19:46:26.0556248Z 2025-05-07T19:46:26.0556251Z 2025-05-07T19:46:26.0556255Z 2025-05-07T19:46:26.0556286Z 2025-05-07T19:46:26.0556290Z 2025-05-07T19:46:26.0556293Z 2025-05-07T19:46:26.0556297Z 2025-05-07T19:46:26.0556458Z  2025-05-07T19:46:26.0556674Z 2025-05-07T19:46:26.0556726Z 2025-05-07T19:46:26.0556731Z 2025-05-07T19:46:26.0556734Z 2025-05-07T19:46:26.0556737Z 2025-05-07T19:46:26.0556741Z 2025-05-07T19:46:26.0556775Z 2025-05-07T19:46:26.0556778Z 2025-05-07T19:46:26.0556781Z 2025-05-07T19:46:26.0556785Z 2025-05-07T19:46:26.0556788Z 2025-05-07T19:46:26.0556792Z 2025-05-07T19:46:26.0556798Z 2025-05-07T19:46:26.0556802Z 2025-05-07T19:46:26.0556805Z 2025-05-07T19:46:26.0556809Z 2025-05-07T19:46:26.0556974Z  2025-05-07T19:46:26.0557199Z 2025-05-07T19:46:26.0557234Z 2025-05-07T19:46:26.0557238Z 2025-05-07T19:46:26.0557242Z 2025-05-07T19:46:26.0557245Z 2025-05-07T19:46:26.0557248Z 2025-05-07T19:46:26.0557252Z 2025-05-07T19:46:26.0557255Z 2025-05-07T19:46:26.0557258Z 2025-05-07T19:46:26.0557262Z 2025-05-07T19:46:26.0557265Z 2025-05-07T19:46:26.0557269Z 2025-05-07T19:46:26.0557272Z 2025-05-07T19:46:26.0557275Z 2025-05-07T19:46:26.0557279Z 2025-05-07T19:46:26.0557282Z 2025-05-07T19:46:26.0557285Z 2025-05-07T19:46:26.0557460Z  2025-05-07T19:46:26.0557725Z 2025-05-07T19:46:26.0557729Z 2025-05-07T19:46:26.0557732Z 2025-05-07T19:46:26.0557736Z 2025-05-07T19:46:26.0557739Z 2025-05-07T19:46:26.0557742Z 2025-05-07T19:46:26.0557746Z 2025-05-07T19:46:26.0557754Z 2025-05-07T19:46:26.0557757Z 2025-05-07T19:46:26.0557761Z 2025-05-07T19:46:26.0557764Z 2025-05-07T19:46:26.0557767Z 2025-05-07T19:46:26.0557771Z 2025-05-07T19:46:26.0557774Z 2025-05-07T19:46:26.0557777Z 2025-05-07T19:46:26.0557781Z 2025-05-07T19:46:26.0557784Z 2025-05-07T19:46:26.0557788Z 2025-05-07T19:46:26.0557992Z  2025-05-07T19:46:26.0558228Z 2025-05-07T19:46:26.0558232Z 2025-05-07T19:46:26.0558342Z  2025-05-07T19:46:26.0558480Z 2025-05-07T19:46:26.0558484Z 2025-05-07T19:46:26.0558654Z  2025-05-07T19:46:26.0558776Z 2025-05-07T19:46:26.0558779Z 2025-05-07T19:46:26.0558783Z 2025-05-07T19:46:26.0558900Z  2025-05-07T19:46:26.0559026Z 2025-05-07T19:46:26.0559030Z 2025-05-07T19:46:26.0559064Z 2025-05-07T19:46:26.0559067Z 2025-05-07T19:46:26.0559190Z  2025-05-07T19:46:26.0559324Z 2025-05-07T19:46:26.0559327Z 2025-05-07T19:46:26.0559331Z 2025-05-07T19:46:26.0559334Z 2025-05-07T19:46:26.0559338Z 2025-05-07T19:46:26.0559536Z  2025-05-07T19:46:26.0559677Z 2025-05-07T19:46:26.0559680Z 2025-05-07T19:46:26.0559683Z 2025-05-07T19:46:26.0559687Z 2025-05-07T19:46:26.0559691Z 2025-05-07T19:46:26.0559694Z 2025-05-07T19:46:26.0559822Z  2025-05-07T19:46:26.0560051Z 2025-05-07T19:46:26.0560054Z 2025-05-07T19:46:26.0560058Z 2025-05-07T19:46:26.0560061Z 2025-05-07T19:46:26.0560065Z 2025-05-07T19:46:26.0560068Z 2025-05-07T19:46:26.0560071Z 2025-05-07T19:46:26.0560230Z  2025-05-07T19:46:26.0560388Z 2025-05-07T19:46:26.0560392Z 2025-05-07T19:46:26.0560395Z 2025-05-07T19:46:26.0560399Z 2025-05-07T19:46:26.0560402Z 2025-05-07T19:46:26.0560406Z 2025-05-07T19:46:26.0560412Z 2025-05-07T19:46:26.0560416Z 2025-05-07T19:46:26.0560583Z  2025-05-07T19:46:26.0560752Z 2025-05-07T19:46:26.0560756Z 2025-05-07T19:46:26.0560760Z 2025-05-07T19:46:26.0560763Z 2025-05-07T19:46:26.0560766Z 2025-05-07T19:46:26.0560770Z 2025-05-07T19:46:26.0560777Z 2025-05-07T19:46:26.0560781Z 2025-05-07T19:46:26.0560784Z 2025-05-07T19:46:26.0560951Z  2025-05-07T19:46:26.0561132Z 2025-05-07T19:46:26.0561135Z 2025-05-07T19:46:26.0561139Z 2025-05-07T19:46:26.0561142Z 2025-05-07T19:46:26.0561146Z 2025-05-07T19:46:26.0561149Z 2025-05-07T19:46:26.0561152Z 2025-05-07T19:46:26.0561156Z 2025-05-07T19:46:26.0561159Z 2025-05-07T19:46:26.0561163Z 2025-05-07T19:46:26.0561335Z  2025-05-07T19:46:26.0561520Z 2025-05-07T19:46:26.0561523Z 2025-05-07T19:46:26.0561527Z 2025-05-07T19:46:26.0561530Z 2025-05-07T19:46:26.0561534Z 2025-05-07T19:46:26.0561537Z 2025-05-07T19:46:26.0561541Z 2025-05-07T19:46:26.0561604Z 2025-05-07T19:46:26.0561608Z 2025-05-07T19:46:26.0561612Z 2025-05-07T19:46:26.0561615Z 2025-05-07T19:46:26.0561801Z  2025-05-07T19:46:26.0562001Z 2025-05-07T19:46:26.0562004Z 2025-05-07T19:46:26.0562008Z 2025-05-07T19:46:26.0562011Z 2025-05-07T19:46:26.0562018Z 2025-05-07T19:46:26.0562022Z 2025-05-07T19:46:26.0562025Z 2025-05-07T19:46:26.0562029Z 2025-05-07T19:46:26.0562032Z 2025-05-07T19:46:26.0562035Z 2025-05-07T19:46:26.0562039Z 2025-05-07T19:46:26.0562042Z 2025-05-07T19:46:26.0562226Z  2025-05-07T19:46:26.0562508Z 2025-05-07T19:46:26.0562512Z 2025-05-07T19:46:26.0562515Z 2025-05-07T19:46:26.0562519Z 2025-05-07T19:46:26.0562522Z 2025-05-07T19:46:26.0562526Z 2025-05-07T19:46:26.0562529Z 2025-05-07T19:46:26.0562533Z 2025-05-07T19:46:26.0562536Z 2025-05-07T19:46:26.0562540Z 2025-05-07T19:46:26.0562543Z 2025-05-07T19:46:26.0562547Z 2025-05-07T19:46:26.0562550Z 2025-05-07T19:46:26.0562748Z  2025-05-07T19:46:26.0562959Z 2025-05-07T19:46:26.0562963Z 2025-05-07T19:46:26.0562966Z 2025-05-07T19:46:26.0562970Z 2025-05-07T19:46:26.0562973Z 2025-05-07T19:46:26.0562977Z 2025-05-07T19:46:26.0562980Z 2025-05-07T19:46:26.0562983Z 2025-05-07T19:46:26.0562987Z 2025-05-07T19:46:26.0562994Z 2025-05-07T19:46:26.0562997Z 2025-05-07T19:46:26.0563001Z 2025-05-07T19:46:26.0563004Z 2025-05-07T19:46:26.0563038Z 2025-05-07T19:46:26.0563197Z  2025-05-07T19:46:26.0563418Z 2025-05-07T19:46:26.0563422Z 2025-05-07T19:46:26.0563425Z 2025-05-07T19:46:26.0563429Z 2025-05-07T19:46:26.0563432Z 2025-05-07T19:46:26.0563436Z 2025-05-07T19:46:26.0563439Z 2025-05-07T19:46:26.0563443Z 2025-05-07T19:46:26.0563446Z 2025-05-07T19:46:26.0563450Z 2025-05-07T19:46:26.0563483Z 2025-05-07T19:46:26.0563487Z 2025-05-07T19:46:26.0563490Z 2025-05-07T19:46:26.0563494Z 2025-05-07T19:46:26.0563497Z 2025-05-07T19:46:26.0563664Z  2025-05-07T19:46:26.0563885Z 2025-05-07T19:46:26.0563889Z 2025-05-07T19:46:26.0563892Z 2025-05-07T19:46:26.0563896Z 2025-05-07T19:46:26.0563899Z 2025-05-07T19:46:26.0563938Z 2025-05-07T19:46:26.0563941Z 2025-05-07T19:46:26.0563945Z 2025-05-07T19:46:26.0563948Z 2025-05-07T19:46:26.0564009Z 2025-05-07T19:46:26.0564013Z 2025-05-07T19:46:26.0564016Z 2025-05-07T19:46:26.0564020Z 2025-05-07T19:46:26.0564023Z 2025-05-07T19:46:26.0564026Z 2025-05-07T19:46:26.0564030Z 2025-05-07T19:46:26.0564201Z  2025-05-07T19:46:26.0564461Z 2025-05-07T19:46:26.0564465Z 2025-05-07T19:46:26.0564468Z 2025-05-07T19:46:26.0564471Z 2025-05-07T19:46:26.0564475Z 2025-05-07T19:46:26.0564478Z 2025-05-07T19:46:26.0564481Z 2025-05-07T19:46:26.0564485Z 2025-05-07T19:46:26.0564488Z 2025-05-07T19:46:26.0564492Z 2025-05-07T19:46:26.0564495Z 2025-05-07T19:46:26.0564498Z 2025-05-07T19:46:26.0564502Z 2025-05-07T19:46:26.0564505Z 2025-05-07T19:46:26.0564512Z 2025-05-07T19:46:26.0564515Z 2025-05-07T19:46:26.0564519Z 2025-05-07T19:46:26.0564693Z  2025-05-07T19:46:26.0564967Z 2025-05-07T19:46:26.0564971Z 2025-05-07T19:46:26.0564974Z 2025-05-07T19:46:26.0564977Z 2025-05-07T19:46:26.0564981Z 2025-05-07T19:46:26.0564988Z 2025-05-07T19:46:26.0564991Z 2025-05-07T19:46:26.0564994Z 2025-05-07T19:46:26.0564998Z 2025-05-07T19:46:26.0565001Z 2025-05-07T19:46:26.0565005Z 2025-05-07T19:46:26.0565008Z 2025-05-07T19:46:26.0565011Z 2025-05-07T19:46:26.0565015Z 2025-05-07T19:46:26.0565018Z 2025-05-07T19:46:26.0565022Z 2025-05-07T19:46:26.0565026Z 2025-05-07T19:46:26.0565059Z 2025-05-07T19:46:26.0565241Z  2025-05-07T19:46:26.0565478Z 2025-05-07T19:46:26.0565482Z 2025-05-07T19:46:26.0565595Z  2025-05-07T19:46:26.0565753Z 2025-05-07T19:46:26.0565757Z 2025-05-07T19:46:26.0565874Z  2025-05-07T19:46:26.0566001Z 2025-05-07T19:46:26.0566004Z 2025-05-07T19:46:26.0566056Z 2025-05-07T19:46:26.0566199Z  2025-05-07T19:46:26.0566321Z 2025-05-07T19:46:26.0566324Z 2025-05-07T19:46:26.0566327Z 2025-05-07T19:46:26.0566331Z 2025-05-07T19:46:26.0566450Z  2025-05-07T19:46:26.0566615Z 2025-05-07T19:46:26.0566618Z 2025-05-07T19:46:26.0566625Z 2025-05-07T19:46:26.0566629Z 2025-05-07T19:46:26.0566632Z 2025-05-07T19:46:26.0566749Z  2025-05-07T19:46:26.0566886Z 2025-05-07T19:46:26.0566889Z 2025-05-07T19:46:26.0566918Z 2025-05-07T19:46:26.0566921Z 2025-05-07T19:46:26.0566924Z 2025-05-07T19:46:26.0566928Z 2025-05-07T19:46:26.0567211Z  2025-05-07T19:46:26.0567355Z 2025-05-07T19:46:26.0567359Z 2025-05-07T19:46:26.0567362Z 2025-05-07T19:46:26.0567365Z 2025-05-07T19:46:26.0567369Z 2025-05-07T19:46:26.0567372Z 2025-05-07T19:46:26.0567376Z 2025-05-07T19:46:26.0567530Z  2025-05-07T19:46:26.0567684Z 2025-05-07T19:46:26.0567688Z 2025-05-07T19:46:26.0567691Z 2025-05-07T19:46:26.0567698Z 2025-05-07T19:46:26.0567702Z 2025-05-07T19:46:26.0567705Z 2025-05-07T19:46:26.0567709Z 2025-05-07T19:46:26.0567712Z 2025-05-07T19:46:26.0567845Z  2025-05-07T19:46:26.0567995Z 2025-05-07T19:46:26.0567998Z 2025-05-07T19:46:26.0568002Z 2025-05-07T19:46:26.0568006Z 2025-05-07T19:46:26.0568012Z 2025-05-07T19:46:26.0568016Z 2025-05-07T19:46:26.0568019Z 2025-05-07T19:46:26.0568023Z 2025-05-07T19:46:26.0568027Z 2025-05-07T19:46:26.0568171Z  done 2025-05-07T19:46:26.2618640Z Preparing transaction: - \ done 2025-05-07T19:46:26.8641180Z Verifying transaction: / - \ | / - done 2025-05-07T19:46:27.1684780Z Executing transaction: | / - done 2025-05-07T19:46:29.1548490Z [INSTALL] Fixing file placements for CUDA 12.8.0+ ... 2025-05-07T19:46:29.1549137Z [INSTALL] Creating symlinks: libnvToolsExt.so 2025-05-07T19:46:29.1549973Z + ln -sf /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:29.1550581Z 2025-05-07T19:46:29.1564186Z 2025-05-07T19:46:29.1566508Z + ln -sf /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:29.1569902Z 2025-05-07T19:46:29.1577072Z 2025-05-07T19:46:29.1577590Z [INSTALL] Copying nvtx3 headers ... 2025-05-07T19:46:29.1586635Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/include/ 2025-05-07T19:46:29.1590733Z 2025-05-07T19:46:29.1697948Z 2025-05-07T19:46:29.1708289Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/ 2025-05-07T19:46:29.1712535Z 2025-05-07T19:46:29.1722069Z 2025-05-07T19:46:29.1723033Z [INSTALL] Appending libcuda.so path to LD_LIBRARY_PATH ... 2025-05-07T19:46:29.2139034Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs ... 2025-05-07T19:46:31.1002343Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/lib:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs 2025-05-07T19:46:31.1004955Z 2025-05-07T19:46:31.5112286Z 2025-05-07T19:46:31.5122196Z [INSTALL] Setting environment variable NVML_LIB_PATH ... 2025-05-07T19:46:31.5500456Z + conda env config vars set -n build_binary NVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:31.5501059Z 2025-05-07T19:46:31.9801206Z 2025-05-07T19:46:31.9801999Z [INSTALL] Setting environment variable CUDA_INCLUDE_DIRS ... 2025-05-07T19:46:31.9803191Z + conda env config vars set -n build_binary CUDA_INCLUDE_DIRS="/github/home/miniconda/envs/build_binary/include/:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/" 2025-05-07T19:46:31.9804058Z 2025-05-07T19:46:32.3964304Z 2025-05-07T19:46:34.4124441Z [CHECK] cuda_runtime.h found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/cuda_runtime.h 2025-05-07T19:46:36.3979539Z [CHECK] libcuda.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:46:38.3633976Z [CHECK] libnvToolsExt.so found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:38.3634958Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:40.3432409Z [CHECK] libnvidia-ml.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:42.1821243Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:46:42.1825661Z 2025-05-07T19:46:42.2559597Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:46:46.0069257Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:46:46.0071114Z Target: x86_64-conda-linux-gnu 2025-05-07T19:46:46.0071938Z Thread model: posix 2025-05-07T19:46:46.0072852Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:46:46.0074073Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang.cfg 2025-05-07T19:46:46.0074542Z 2025-05-07T19:46:46.0629174Z [INSTALL] Resetting compiler symlinks to clang ... 2025-05-07T19:46:49.8495820Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:46:49.8497363Z 2025-05-07T19:46:49.8509311Z 2025-05-07T19:46:49.8525554Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:46:49.8526124Z 2025-05-07T19:46:49.8540292Z 2025-05-07T19:46:49.8557899Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:46:49.8558488Z 2025-05-07T19:46:49.8568434Z 2025-05-07T19:46:49.8589158Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:46:49.8589749Z 2025-05-07T19:46:49.8600182Z 2025-05-07T19:46:49.8600529Z + ls -la /github/home/miniconda/envs/build_binary/etc/conda/activate.d 2025-05-07T19:46:49.8600866Z 2025-05-07T19:46:49.8620076Z total 56 2025-05-07T19:46:49.8621353Z drwxr-xr-x. 2 root root 16384 May 7 19:46 . 2025-05-07T19:46:49.8622412Z drwxr-xr-x. 5 root root 62 May 7 19:44 .. 2025-05-07T19:46:49.8623644Z -rw-r--r--. 2 root root 3778 Jun 10 2024 activate-binutils_linux-64.sh 2025-05-07T19:46:49.8625125Z -rw-r--r--. 2 root root 11630 Jun 10 2024 activate-gcc_linux-64.sh 2025-05-07T19:46:49.8626456Z -rw-r--r--. 2 root root 5190 Jun 10 2024 activate-gxx_linux-64.sh 2025-05-07T19:46:49.8627166Z -rw-r--r--. 2 root root 136 Mar 27 01:27 libglib_activate.sh 2025-05-07T19:46:49.8627606Z -rw-r--r--. 2 root root 873 Jun 5 2024 libxml2_activate.sh 2025-05-07T19:46:49.8628024Z -rw-r--r--. 2 root root 499 Nov 30 04:26 openjdk_activate.sh 2025-05-07T19:46:49.8628473Z -rw-r--r--. 2 root root 2932 Jan 24 22:22 ~cuda-nvcc_activate.sh 2025-05-07T19:46:49.8628749Z 2025-05-07T19:46:49.8628971Z [INSTALL] Removing the -ccbin=CXX hook from NVCC activation scripts ... 2025-05-07T19:46:49.8629698Z + sed -i /-ccbin=/d /github/home/miniconda/envs/build_binary/etc/conda/activate.d/*cuda-nvcc_activate.sh 2025-05-07T19:46:49.8630158Z 2025-05-07T19:46:49.8639176Z 2025-05-07T19:46:49.8639771Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:46:49.8640559Z 2025-05-07T19:46:51.8365131Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:46:51.8366448Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:46:51.8366946Z 2025-05-07T19:46:51.8367332Z [BUILD] Setting Clang as the NVCC host compiler: 2025-05-07T19:46:53.7345147Z [BUILD] Setting prepend flags for NVCC ... 2025-05-07T19:46:53.7346179Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-allow-unsupported-compiler -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++" 2025-05-07T19:46:53.7346938Z 2025-05-07T19:46:54.1474949Z 2025-05-07T19:46:54.1475877Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:46:54.1476688Z 2025-05-07T19:46:55.9891858Z -allow-unsupported-compiler -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:46:55.9892454Z 2025-05-07T19:46:56.0652939Z 2025-05-07T19:46:56.0654106Z [INFO] Printing out all preprocessor defines in nvcc ... 2025-05-07T19:46:56.0655664Z + conda run -n build_binary nvcc --compiler-options -dM -E -x cu - < /dev/null 2025-05-07T19:46:56.0657282Z 2025-05-07T19:46:57.9740149Z #define ADJ_ESTERROR 0x0008 2025-05-07T19:46:57.9741217Z #define ADJ_FREQUENCY 0x0002 2025-05-07T19:46:57.9742174Z #define ADJ_MAXERROR 0x0004 2025-05-07T19:46:57.9744906Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:46:57.9747187Z 2025-05-07T19:46:57.9747455Z #define ADJ_MICRO 0x1000 2025-05-07T19:46:57.9748078Z #define ADJ_NANO 0x2000 2025-05-07T19:46:57.9748348Z #define ADJ_OFFSET 0x0001 2025-05-07T19:46:57.9748659Z #define ADJ_OFFSET_SINGLESHOT 0x8001 2025-05-07T19:46:57.9748986Z #define ADJ_OFFSET_SS_READ 0xa001 2025-05-07T19:46:57.9749301Z #define ADJ_STATUS 0x0010 2025-05-07T19:46:57.9749554Z #define ADJ_TAI 0x0080 2025-05-07T19:46:57.9749816Z #define ADJ_TICK 0x4000 2025-05-07T19:46:57.9750200Z #define ADJ_TIMECONST 0x0020 2025-05-07T19:46:57.9750505Z #define AIO_PRIO_DELTA_MAX 20 2025-05-07T19:46:57.9750900Z #define BC_BASE_MAX _POSIX2_BC_BASE_MAX 2025-05-07T19:46:57.9751216Z #define BC_DIM_MAX _POSIX2_BC_DIM_MAX 2025-05-07T19:46:57.9751514Z #define BC_SCALE_MAX _POSIX2_BC_SCALE_MAX 2025-05-07T19:46:57.9751863Z #define BC_STRING_MAX _POSIX2_BC_STRING_MAX 2025-05-07T19:46:57.9752200Z #define BIG_ENDIAN __BIG_ENDIAN 2025-05-07T19:46:57.9752480Z #define BUFSIZ _IO_BUFSIZ 2025-05-07T19:46:57.9752775Z #define BYTE_ORDER __BYTE_ORDER 2025-05-07T19:46:57.9753046Z #define CHARCLASS_NAME_MAX 2048 2025-05-07T19:46:57.9753332Z #define CHAR_BIT __CHAR_BIT__ 2025-05-07T19:46:57.9773695Z #define CHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:57.9774584Z #define CHAR_MIN SCHAR_MIN 2025-05-07T19:46:57.9774903Z #define CLOCKS_PER_SEC 1000000l 2025-05-07T19:46:57.9775253Z #define CLOCK_BOOTTIME 7 2025-05-07T19:46:57.9775579Z #define CLOCK_BOOTTIME_ALARM 9 2025-05-07T19:46:57.9775885Z #define CLOCK_MONOTONIC 1 2025-05-07T19:46:57.9776224Z #define CLOCK_MONOTONIC_COARSE 6 2025-05-07T19:46:57.9776544Z #define CLOCK_MONOTONIC_RAW 4 2025-05-07T19:46:57.9776882Z #define CLOCK_PROCESS_CPUTIME_ID 2 2025-05-07T19:46:57.9777191Z #define CLOCK_REALTIME 0 2025-05-07T19:46:57.9777508Z #define CLOCK_REALTIME_ALARM 8 2025-05-07T19:46:57.9777807Z #define CLOCK_REALTIME_COARSE 5 2025-05-07T19:46:57.9778130Z #define CLOCK_TAI 11 2025-05-07T19:46:57.9778412Z #define CLOCK_THREAD_CPUTIME_ID 3 2025-05-07T19:46:57.9778757Z #define COLL_WEIGHTS_MAX 255 2025-05-07T19:46:57.9779091Z #define CUDARTAPI 2025-05-07T19:46:57.9779362Z #define CUDARTAPI_CDECL 2025-05-07T19:46:57.9779627Z #define CUDART_CB 2025-05-07T19:46:57.9779919Z #define CUDART_DEVICE __device__ 2025-05-07T19:46:57.9780210Z #define CUDART_VERSION 12080 2025-05-07T19:46:57.9780536Z #define CUDA_DOUBLE_MATH_FUNCTIONS 1 2025-05-07T19:46:57.9780877Z #define CUDA_IPC_HANDLE_SIZE 64 2025-05-07T19:46:57.9781180Z #define CU_UUID_HAS_BEEN_DEFINED 2025-05-07T19:46:57.9781509Z #define DELAYTIMER_MAX 2147483647 2025-05-07T19:46:57.9781789Z #define DOMAIN 1 2025-05-07T19:46:57.9782055Z #define EOF (-1) 2025-05-07T19:46:57.9782298Z #define EXIT_FAILURE 1 2025-05-07T19:46:57.9782586Z #define EXIT_SUCCESS 0 2025-05-07T19:46:57.9782871Z #define EXPR_NEST_MAX _POSIX2_EXPR_NEST_MAX 2025-05-07T19:46:57.9783267Z #define FD_CLR(fd,fdsetp) __FD_CLR (fd, fdsetp) 2025-05-07T19:46:57.9783684Z #define FD_ISSET(fd,fdsetp) __FD_ISSET (fd, fdsetp) 2025-05-07T19:46:57.9784106Z #define FD_SET(fd,fdsetp) __FD_SET (fd, fdsetp) 2025-05-07T19:46:57.9784480Z #define FD_SETSIZE __FD_SETSIZE 2025-05-07T19:46:57.9784800Z #define FD_ZERO(fdsetp) __FD_ZERO (fdsetp) 2025-05-07T19:46:57.9785152Z #define FILENAME_MAX 4096 2025-05-07T19:46:57.9785427Z #define FOPEN_MAX 16 2025-05-07T19:46:57.9785718Z #define FP_ILOGB0 (-2147483647 - 1) 2025-05-07T19:46:57.9786030Z #define FP_ILOGBNAN (-2147483647 - 1) 2025-05-07T19:46:57.9786357Z #define FP_INFINITE 1 2025-05-07T19:46:57.9786609Z #define FP_NAN 0 2025-05-07T19:46:57.9786896Z #define FP_NORMAL 4 2025-05-07T19:46:57.9787294Z #define FP_SUBNORMAL 3 2025-05-07T19:46:57.9787584Z #define FP_ZERO 2 2025-05-07T19:46:57.9787867Z #define HOST_NAME_MAX 64 2025-05-07T19:46:57.9788134Z #define HUGE 3.40282347e+38F 2025-05-07T19:46:57.9788454Z #define HUGE_VAL (__builtin_huge_val()) 2025-05-07T19:46:57.9788795Z #define HUGE_VALF (__builtin_huge_valf()) 2025-05-07T19:46:57.9789171Z #define HUGE_VALL (__builtin_huge_vall()) 2025-05-07T19:46:57.9789523Z #define INFINITY (__builtin_inff()) 2025-05-07T19:46:57.9789860Z #define INT_MAX __INT_MAX__ 2025-05-07T19:46:57.9790201Z #define INT_MIN (-__INT_MAX__ -1) 2025-05-07T19:46:57.9790528Z #define IOV_MAX 1024 2025-05-07T19:46:57.9790793Z #define LINE_MAX _POSIX2_LINE_MAX 2025-05-07T19:46:57.9791144Z #define LITTLE_ENDIAN __LITTLE_ENDIAN 2025-05-07T19:46:57.9791505Z #define LLONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:57.9791836Z #define LLONG_MIN (-__LONG_LONG_MAX__-1LL) 2025-05-07T19:46:57.9792207Z #define LOGIN_NAME_MAX 256 2025-05-07T19:46:57.9792486Z #define LONG_BIT 64 2025-05-07T19:46:57.9792812Z #define LONG_LONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:57.9793172Z #define LONG_LONG_MIN (-__LONG_LONG_MAX__-1LL) 2025-05-07T19:46:57.9793555Z #define LONG_MAX __LONG_MAX__ 2025-05-07T19:46:57.9793856Z #define LONG_MIN (-__LONG_MAX__ -1L) 2025-05-07T19:46:57.9794190Z #define L_ctermid 9 2025-05-07T19:46:57.9794462Z #define L_cuserid 9 2025-05-07T19:46:57.9794707Z #define L_tmpnam 20 2025-05-07T19:46:57.9794987Z #define MATH_ERREXCEPT 2 2025-05-07T19:46:57.9795256Z #define MATH_ERRNO 1 2025-05-07T19:46:57.9795539Z #define MAX_CANON 255 2025-05-07T19:46:57.9795802Z #define MAX_INPUT 255 2025-05-07T19:46:57.9796127Z #define MB_CUR_MAX (__ctype_get_mb_cur_max ()) 2025-05-07T19:46:57.9796555Z #define MB_LEN_MAX 16 2025-05-07T19:46:57.9796867Z #define MOD_CLKA ADJ_OFFSET_SINGLESHOT 2025-05-07T19:46:57.9797189Z #define MOD_CLKB ADJ_TICK 2025-05-07T19:46:57.9797495Z #define MOD_ESTERROR ADJ_ESTERROR 2025-05-07T19:46:57.9797801Z #define MOD_FREQUENCY ADJ_FREQUENCY 2025-05-07T19:46:57.9798141Z #define MOD_MAXERROR ADJ_MAXERROR 2025-05-07T19:46:57.9798479Z #define MOD_MICRO ADJ_MICRO 2025-05-07T19:46:57.9798756Z #define MOD_NANO ADJ_NANO 2025-05-07T19:46:57.9799058Z #define MOD_OFFSET ADJ_OFFSET 2025-05-07T19:46:57.9799351Z #define MOD_STATUS ADJ_STATUS 2025-05-07T19:46:57.9799661Z #define MOD_TAI ADJ_TAI 2025-05-07T19:46:57.9799933Z #define MOD_TIMECONST ADJ_TIMECONST 2025-05-07T19:46:57.9800385Z #define MQ_PRIO_MAX 32768 2025-05-07T19:46:57.9800658Z #define M_1_PI 0.31830988618379067154 2025-05-07T19:46:57.9801021Z #define M_1_PIl 0.318309886183790671537767526745028724L 2025-05-07T19:46:57.9801367Z #define M_2_PI 0.63661977236758134308 2025-05-07T19:46:57.9801906Z #define M_2_PIl 0.636619772367581343075535053490057448L 2025-05-07T19:46:57.9802302Z #define M_2_SQRTPI 1.12837916709551257390 2025-05-07T19:46:57.9802822Z #define M_2_SQRTPIl 1.128379167095512573896158903121545172L 2025-05-07T19:46:57.9803235Z #define M_E 2.7182818284590452354 2025-05-07T19:46:57.9803573Z #define M_El 2.718281828459045235360287471352662498L 2025-05-07T19:46:57.9803953Z #define M_LN10 2.30258509299404568402 2025-05-07T19:46:57.9804293Z #define M_LN10l 2.302585092994045684017991454684364208L 2025-05-07T19:46:57.9804678Z #define M_LN2 0.69314718055994530942 2025-05-07T19:46:57.9805038Z #define M_LN2l 0.693147180559945309417232121458176568L 2025-05-07T19:46:57.9805393Z #define M_LOG10E 0.43429448190325182765 2025-05-07T19:46:57.9805771Z #define M_LOG10El 0.434294481903251827651128918916605082L 2025-05-07T19:46:57.9806138Z #define M_LOG2E 1.4426950408889634074 2025-05-07T19:46:57.9806514Z #define M_LOG2El 1.442695040888963407359924681001892137L 2025-05-07T19:46:57.9806878Z #define M_PI 3.14159265358979323846 2025-05-07T19:46:57.9807209Z #define M_PI_2 1.57079632679489661923 2025-05-07T19:46:57.9807552Z #define M_PI_2l 1.570796326794896619231321691639751442L 2025-05-07T19:46:57.9807933Z #define M_PI_4 0.78539816339744830962 2025-05-07T19:46:57.9808304Z #define M_PI_4l 0.785398163397448309615660845819875721L 2025-05-07T19:46:57.9808821Z #define M_PIl 3.141592653589793238462643383279502884L 2025-05-07T19:46:57.9809324Z #define M_SQRT1_2 0.70710678118654752440 2025-05-07T19:46:57.9809684Z #define M_SQRT1_2l 0.707106781186547524400844362104849039L 2025-05-07T19:46:57.9810076Z #define M_SQRT2 1.41421356237309504880 2025-05-07T19:46:57.9810423Z #define M_SQRT2l 1.414213562373095048801688724209698079L 2025-05-07T19:46:57.9810799Z #define NAME_MAX 255 2025-05-07T19:46:57.9811059Z #define NAN (__builtin_nanf ("")) 2025-05-07T19:46:57.9811377Z #define NFDBITS __NFDBITS 2025-05-07T19:46:57.9811644Z #define NGROUPS_MAX 65536 2025-05-07T19:46:57.9811944Z #define NL_ARGMAX _POSIX_ARG_MAX 2025-05-07T19:46:57.9812275Z #define NL_LANGMAX _POSIX2_LINE_MAX 2025-05-07T19:46:57.9812572Z #define NL_MSGMAX INT_MAX 2025-05-07T19:46:57.9812861Z #define NL_NMAX INT_MAX 2025-05-07T19:46:57.9813120Z #define NL_SETMAX INT_MAX 2025-05-07T19:46:57.9813411Z #define NL_TEXTMAX INT_MAX 2025-05-07T19:46:57.9813671Z #define NULL __null 2025-05-07T19:46:57.9813934Z #define NZERO 20 2025-05-07T19:46:57.9814168Z #define OVERFLOW 3 2025-05-07T19:46:57.9814430Z #define PATH_MAX 4096 2025-05-07T19:46:57.9814693Z #define PDP_ENDIAN __PDP_ENDIAN 2025-05-07T19:46:57.9814999Z #define PIPE_BUF 4096 2025-05-07T19:46:57.9815270Z #define PLOSS 6 2025-05-07T19:46:57.9815641Z #define PTHREAD_DESTRUCTOR_ITERATIONS _POSIX_THREAD_DESTRUCTOR_ITERATIONS 2025-05-07T19:46:57.9816121Z #define PTHREAD_KEYS_MAX 1024 2025-05-07T19:46:57.9816408Z #define PTHREAD_STACK_MIN 16384 2025-05-07T19:46:57.9816717Z #define P_tmpdir "/tmp" 2025-05-07T19:46:57.9816980Z #define RAND_MAX 2147483647 2025-05-07T19:46:57.9817275Z #define RE_DUP_MAX (0x7fff) 2025-05-07T19:46:57.9817542Z #define RTSIG_MAX 32 2025-05-07T19:46:57.9817898Z #define SCHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:57.9818194Z #define SCHAR_MIN (-__SCHAR_MAX__-1) 2025-05-07T19:46:57.9818515Z #define SEEK_CUR 1 2025-05-07T19:46:57.9818751Z #define SEEK_DATA 3 2025-05-07T19:46:57.9819018Z #define SEEK_END 2 2025-05-07T19:46:57.9819288Z #define SEEK_HOLE 4 2025-05-07T19:46:57.9819533Z #define SEEK_SET 0 2025-05-07T19:46:57.9819809Z #define SEM_VALUE_MAX (2147483647) 2025-05-07T19:46:57.9820108Z #define SHRT_MAX __SHRT_MAX__ 2025-05-07T19:46:57.9820425Z #define SHRT_MIN (-__SHRT_MAX__ -1) 2025-05-07T19:46:57.9820716Z #define SING 2 2025-05-07T19:46:57.9820980Z #define SSIZE_MAX LONG_MAX 2025-05-07T19:46:57.9821243Z #define STA_CLK 0x8000 2025-05-07T19:46:57.9821522Z #define STA_CLOCKERR 0x1000 2025-05-07T19:46:57.9821790Z #define STA_DEL 0x0020 2025-05-07T19:46:57.9822070Z #define STA_FLL 0x0008 2025-05-07T19:46:57.9822357Z #define STA_FREQHOLD 0x0080 2025-05-07T19:46:57.9822626Z #define STA_INS 0x0010 2025-05-07T19:46:57.9822913Z #define STA_MODE 0x4000 2025-05-07T19:46:57.9823169Z #define STA_NANO 0x2000 2025-05-07T19:46:57.9823452Z #define STA_PLL 0x0001 2025-05-07T19:46:57.9823712Z #define STA_PPSERROR 0x0800 2025-05-07T19:46:57.9824014Z #define STA_PPSFREQ 0x0002 2025-05-07T19:46:57.9824287Z #define STA_PPSJITTER 0x0200 2025-05-07T19:46:57.9824594Z #define STA_PPSSIGNAL 0x0100 2025-05-07T19:46:57.9824879Z #define STA_PPSTIME 0x0004 2025-05-07T19:46:57.9825182Z #define STA_PPSWANDER 0x0400 2025-05-07T19:46:57.9825784Z #define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK) 2025-05-07T19:46:57.9826392Z #define STA_UNSYNC 0x0040 2025-05-07T19:46:57.9826683Z #define TIMER_ABSTIME 1 2025-05-07T19:46:57.9826933Z #define TIME_UTC 1 2025-05-07T19:46:57.9827179Z #define TLOSS 5 2025-05-07T19:46:57.9827445Z #define TMP_MAX 238328 2025-05-07T19:46:57.9827880Z #define TTY_NAME_MAX 32 2025-05-07T19:46:57.9828183Z #define UCHAR_MAX (__SCHAR_MAX__*2 +1) 2025-05-07T19:46:57.9828542Z #define UINT_MAX (__INT_MAX__ *2U +1U) 2025-05-07T19:46:57.9828907Z #define ULLONG_MAX (__LONG_LONG_MAX__*2ULL+1ULL) 2025-05-07T19:46:57.9829322Z #define ULONG_LONG_MAX (__LONG_LONG_MAX__*2ULL+1ULL) 2025-05-07T19:46:57.9829696Z #define ULONG_MAX (__LONG_MAX__ *2UL+1UL) 2025-05-07T19:46:57.9830040Z #define UNDERFLOW 4 2025-05-07T19:46:57.9830395Z #define USHRT_MAX (__SHRT_MAX__ *2 +1) 2025-05-07T19:46:57.9830738Z #define WCONTINUED 8 2025-05-07T19:46:57.9830993Z #define WEXITED 4 2025-05-07T19:46:57.9831367Z #define WEXITSTATUS(status) __WEXITSTATUS (__WAIT_INT (status)) 2025-05-07T19:46:57.9831904Z #define WIFCONTINUED(status) __WIFCONTINUED (__WAIT_INT (status)) 2025-05-07T19:46:57.9832394Z #define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status)) 2025-05-07T19:46:57.9832904Z #define WIFSIGNALED(status) __WIFSIGNALED (__WAIT_INT (status)) 2025-05-07T19:46:57.9833391Z #define WIFSTOPPED(status) __WIFSTOPPED (__WAIT_INT (status)) 2025-05-07T19:46:57.9833810Z #define WNOHANG 1 2025-05-07T19:46:57.9834068Z #define WNOWAIT 0x01000000 2025-05-07T19:46:57.9834364Z #define WORD_BIT 32 2025-05-07T19:46:57.9834604Z #define WSTOPPED 2 2025-05-07T19:46:57.9834943Z #define WSTOPSIG(status) __WSTOPSIG (__WAIT_INT (status)) 2025-05-07T19:46:57.9835409Z #define WTERMSIG(status) __WTERMSIG (__WAIT_INT (status)) 2025-05-07T19:46:57.9835780Z #define WUNTRACED 2 2025-05-07T19:46:57.9836059Z #define XATTR_LIST_MAX 65536 2025-05-07T19:46:57.9836345Z #define XATTR_NAME_MAX 255 2025-05-07T19:46:57.9836648Z #define XATTR_SIZE_MAX 65536 2025-05-07T19:46:57.9836940Z #define X_TLOSS 1.41484755040568800000e+16 2025-05-07T19:46:57.9837280Z #define _ACRTIMP 2025-05-07T19:46:57.9837520Z #define _ALLOCA_H 1 2025-05-07T19:46:57.9837783Z #define _ASSERT_H 1 2025-05-07T19:46:57.9838028Z #define _ATFILE_SOURCE 1 2025-05-07T19:46:57.9838317Z #define _BITS_BYTESWAP_H 1 2025-05-07T19:46:57.9838598Z #define _BITS_POSIX1_LIM_H 1 2025-05-07T19:46:57.9838907Z #define _BITS_POSIX2_LIM_H 1 2025-05-07T19:46:57.9839220Z #define _BITS_PTHREADTYPES_H 1 2025-05-07T19:46:57.9839584Z #define _BITS_TIMEX_H 1 2025-05-07T19:46:57.9839872Z #define _BITS_TIME_H 1 2025-05-07T19:46:57.9840132Z #define _BITS_TYPESIZES_H 1 2025-05-07T19:46:57.9840436Z #define _BITS_TYPES_H 1 2025-05-07T19:46:57.9840700Z #define _BSD_SOURCE 1 2025-05-07T19:46:57.9840990Z #define _CONCEPT_CHECK_H 1 2025-05-07T19:46:57.9841282Z #define _CPP_TYPE_TRAITS_H 1 2025-05-07T19:46:57.9841588Z #define _CRTIMP 2025-05-07T19:46:57.9841827Z #define _CTYPE_H 1 2025-05-07T19:46:57.9842094Z #define _ENDIAN_H 1 2025-05-07T19:46:57.9842351Z #define _EXCEPTION_DEFINES_H 1 2025-05-07T19:46:57.9842779Z #define _EXT_NUMERIC_TRAITS 1 2025-05-07T19:46:57.9843103Z #define _EXT_TYPE_TRAITS 1 2025-05-07T19:46:57.9843375Z #define _FEATURES_H 1 2025-05-07T19:46:57.9843672Z #define _FUNCTEXCEPT_H 1 2025-05-07T19:46:57.9843945Z #define _GCC_LIMITS_H_ 2025-05-07T19:46:57.9844282Z #define _GLIBCXX11_DEPRECATED _GLIBCXX_DEPRECATED 2025-05-07T19:46:57.9844783Z #define _GLIBCXX11_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:57.9845293Z #define _GLIBCXX11_USE_C99_COMPLEX 1 2025-05-07T19:46:57.9845613Z #define _GLIBCXX11_USE_C99_MATH 1 2025-05-07T19:46:57.9845950Z #define _GLIBCXX11_USE_C99_STDIO 1 2025-05-07T19:46:57.9846286Z #define _GLIBCXX11_USE_C99_STDLIB 1 2025-05-07T19:46:57.9846596Z #define _GLIBCXX11_USE_C99_WCHAR 1 2025-05-07T19:46:57.9846942Z #define _GLIBCXX14_CONSTEXPR constexpr 2025-05-07T19:46:57.9847276Z #define _GLIBCXX17_CONSTEXPR constexpr 2025-05-07T19:46:57.9847656Z #define _GLIBCXX17_DEPRECATED [[__deprecated__]] 2025-05-07T19:46:57.9848145Z #define _GLIBCXX17_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:57.9848635Z #define _GLIBCXX17_INLINE inline 2025-05-07T19:46:57.9848938Z #define _GLIBCXX20_CONSTEXPR 2025-05-07T19:46:57.9849262Z #define _GLIBCXX20_DEPRECATED(MSG) 2025-05-07T19:46:57.9849595Z #define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:57.9849959Z #define _GLIBCXX98_USE_C99_COMPLEX 1 2025-05-07T19:46:57.9850301Z #define _GLIBCXX98_USE_C99_MATH 1 2025-05-07T19:46:57.9850603Z #define _GLIBCXX98_USE_C99_STDIO 1 2025-05-07T19:46:57.9850937Z #define _GLIBCXX98_USE_C99_STDLIB 1 2025-05-07T19:46:57.9851246Z #define _GLIBCXX98_USE_C99_WCHAR 1 2025-05-07T19:46:57.9851666Z #define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11"))) 2025-05-07T19:46:57.9854580Z #define _GLIBCXX_ATOMIC_BUILTINS 1 2025-05-07T19:46:57.9854948Z #define _GLIBCXX_BEGIN_EXTERN_C extern "C" { 2025-05-07T19:46:57.9855408Z #define _GLIBCXX_BEGIN_NAMESPACE_ALGO 2025-05-07T19:46:57.9855757Z #define _GLIBCXX_BEGIN_NAMESPACE_CONTAINER 2025-05-07T19:46:57.9856163Z #define _GLIBCXX_BEGIN_NAMESPACE_CXX11 namespace __cxx11 { 2025-05-07T19:46:57.9856535Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL 2025-05-07T19:46:57.9856987Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_BEGIN_NAMESPACE_CXX11 2025-05-07T19:46:57.9857436Z #define _GLIBCXX_BEGIN_NAMESPACE_VERSION 2025-05-07T19:46:57.9857775Z #define _GLIBCXX_BITS_SPECFUN_H 1 2025-05-07T19:46:57.9858074Z #define _GLIBCXX_BITS_STD_ABS_H 2025-05-07T19:46:57.9858370Z #define _GLIBCXX_CMATH 1 2025-05-07T19:46:57.9858657Z #define _GLIBCXX_CONST __attribute__ ((__const__)) 2025-05-07T19:46:57.9859027Z #define _GLIBCXX_CONSTEXPR constexpr 2025-05-07T19:46:57.9859324Z #define _GLIBCXX_CPU_DEFINES 1 2025-05-07T19:46:57.9859581Z #define _GLIBCXX_CSTDLIB 1 2025-05-07T19:46:57.9859847Z #define _GLIBCXX_CXX_CONFIG_H 1 2025-05-07T19:46:57.9860121Z #define _GLIBCXX_DARWIN_USE_64_BIT_INODE 1 2025-05-07T19:46:57.9860446Z #define _GLIBCXX_DEBUG_ASSERT(_Condition) 2025-05-07T19:46:57.9860925Z #define _GLIBCXX_DEBUG_ASSERTIONS_H 1 2025-05-07T19:46:57.9861259Z #define _GLIBCXX_DEBUG_MACRO_SWITCH_H 1 2025-05-07T19:46:57.9861576Z #define _GLIBCXX_DEBUG_ONLY(_Statement) 2025-05-07T19:46:57.9861927Z #define _GLIBCXX_DEBUG_PEDASSERT(_Condition) 2025-05-07T19:46:57.9862381Z #define _GLIBCXX_DEFAULT_ABI_TAG _GLIBCXX_ABI_TAG_CXX11 2025-05-07T19:46:57.9862802Z #define _GLIBCXX_DEPRECATED __attribute__ ((__deprecated__)) 2025-05-07T19:46:57.9863480Z #define _GLIBCXX_DEPRECATED_SUGGEST(ALT) __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) 2025-05-07T19:46:57.9864016Z #define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 2025-05-07T19:46:57.9864350Z #define _GLIBCXX_END_EXTERN_C } 2025-05-07T19:46:57.9864634Z #define _GLIBCXX_END_NAMESPACE_ALGO 2025-05-07T19:46:57.9864969Z #define _GLIBCXX_END_NAMESPACE_CONTAINER 2025-05-07T19:46:57.9865294Z #define _GLIBCXX_END_NAMESPACE_CXX11 } 2025-05-07T19:46:57.9865623Z #define _GLIBCXX_END_NAMESPACE_LDBL 2025-05-07T19:46:57.9866044Z #define _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_END_NAMESPACE_CXX11 2025-05-07T19:46:57.9866482Z #define _GLIBCXX_END_NAMESPACE_VERSION 2025-05-07T19:46:57.9866809Z #define _GLIBCXX_EXTERN_TEMPLATE 1 2025-05-07T19:46:57.9867300Z #define _GLIBCXX_FAST_MATH 0 2025-05-07T19:46:57.9867794Z #define _GLIBCXX_FLOAT_IS_IEEE_BINARY32 1 2025-05-07T19:46:57.9868227Z #define _GLIBCXX_FORWARD(_Tp,__val) std::forward<_Tp>(__val) 2025-05-07T19:46:57.9868638Z #define _GLIBCXX_FULLY_DYNAMIC_STRING 0 2025-05-07T19:46:57.9868964Z #define _GLIBCXX_FWDREF(_Tp) _Tp&& 2025-05-07T19:46:57.9869278Z #define _GLIBCXX_HAS_GTHREADS 1 2025-05-07T19:46:57.9870223Z #define _GLIBCXX_HAS_NESTED_TYPE(_NTYPE) template> struct __has_##_NTYPE : false_type { }; template struct __has_##_NTYPE<_Tp, __void_t> : true_type { }; 2025-05-07T19:46:57.9871337Z #define _GLIBCXX_HAVE_ACOSF 1 2025-05-07T19:46:57.9871652Z #define _GLIBCXX_HAVE_ACOSL 1 2025-05-07T19:46:57.9871980Z #define _GLIBCXX_HAVE_ALIGNED_ALLOC 1 2025-05-07T19:46:57.9872306Z #define _GLIBCXX_HAVE_ARPA_INET_H 1 2025-05-07T19:46:57.9872653Z #define _GLIBCXX_HAVE_ASINF 1 2025-05-07T19:46:57.9872943Z #define _GLIBCXX_HAVE_ASINL 1 2025-05-07T19:46:57.9873279Z #define _GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE 1 2025-05-07T19:46:57.9873625Z #define _GLIBCXX_HAVE_ATAN2F 1 2025-05-07T19:46:57.9873949Z #define _GLIBCXX_HAVE_ATAN2L 1 2025-05-07T19:46:57.9874239Z #define _GLIBCXX_HAVE_ATANF 1 2025-05-07T19:46:57.9874559Z #define _GLIBCXX_HAVE_ATANL 1 2025-05-07T19:46:57.9874865Z #define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1 2025-05-07T19:46:57.9875239Z #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY 1 2025-05-07T19:46:57.9875606Z #define _GLIBCXX_HAVE_AT_QUICK_EXIT 1 2025-05-07T19:46:57.9875951Z #define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1 2025-05-07T19:46:57.9876491Z #define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 2025-05-07T19:46:57.9876872Z #define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 2025-05-07T19:46:57.9877447Z #define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 2025-05-07T19:46:57.9877776Z #define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 2025-05-07T19:46:57.9878118Z #define _GLIBCXX_HAVE_CEILF 1 2025-05-07T19:46:57.9878403Z #define _GLIBCXX_HAVE_CEILL 1 2025-05-07T19:46:57.9878721Z #define _GLIBCXX_HAVE_COMPLEX_H 1 2025-05-07T19:46:57.9879053Z #define _GLIBCXX_HAVE_COSF 1 2025-05-07T19:46:57.9879339Z #define _GLIBCXX_HAVE_COSHF 1 2025-05-07T19:46:57.9879641Z #define _GLIBCXX_HAVE_COSHL 1 2025-05-07T19:46:57.9880036Z #define _GLIBCXX_HAVE_COSL 1 2025-05-07T19:46:57.9880335Z #define _GLIBCXX_HAVE_DIRENT_H 1 2025-05-07T19:46:57.9880625Z #define _GLIBCXX_HAVE_DLFCN_H 1 2025-05-07T19:46:57.9880929Z #define _GLIBCXX_HAVE_ENDIAN_H 1 2025-05-07T19:46:57.9881252Z #define _GLIBCXX_HAVE_EXCEPTION_PTR_SINCE_GCC46 1 2025-05-07T19:46:57.9881620Z #define _GLIBCXX_HAVE_EXECINFO_H 1 2025-05-07T19:46:57.9881914Z #define _GLIBCXX_HAVE_EXPF 1 2025-05-07T19:46:57.9882207Z #define _GLIBCXX_HAVE_EXPL 1 2025-05-07T19:46:57.9882590Z #define _GLIBCXX_HAVE_FABSF 1 2025-05-07T19:46:57.9883034Z #define _GLIBCXX_HAVE_FABSL 1 2025-05-07T19:46:57.9883353Z #define _GLIBCXX_HAVE_FCNTL_H 1 2025-05-07T19:46:57.9883710Z #define _GLIBCXX_HAVE_FENV_H 1 2025-05-07T19:46:57.9884032Z #define _GLIBCXX_HAVE_FINITE 1 2025-05-07T19:46:57.9884329Z #define _GLIBCXX_HAVE_FINITEF 1 2025-05-07T19:46:57.9884654Z #define _GLIBCXX_HAVE_FINITEL 1 2025-05-07T19:46:57.9884952Z #define _GLIBCXX_HAVE_FLOAT_H 1 2025-05-07T19:46:57.9885268Z #define _GLIBCXX_HAVE_FLOORF 1 2025-05-07T19:46:57.9885677Z #define _GLIBCXX_HAVE_FLOORL 1 2025-05-07T19:46:57.9886000Z #define _GLIBCXX_HAVE_FMODF 1 2025-05-07T19:46:57.9886322Z #define _GLIBCXX_HAVE_FMODL 1 2025-05-07T19:46:57.9886615Z #define _GLIBCXX_HAVE_FREXPF 1 2025-05-07T19:46:57.9886942Z #define _GLIBCXX_HAVE_FREXPL 1 2025-05-07T19:46:57.9887239Z #define _GLIBCXX_HAVE_GETIPINFO 1 2025-05-07T19:46:57.9887577Z #define _GLIBCXX_HAVE_GETS 1 2025-05-07T19:46:57.9887865Z #define _GLIBCXX_HAVE_HYPOT 1 2025-05-07T19:46:57.9888182Z #define _GLIBCXX_HAVE_HYPOTF 1 2025-05-07T19:46:57.9888469Z #define _GLIBCXX_HAVE_HYPOTL 1 2025-05-07T19:46:57.9888777Z #define _GLIBCXX_HAVE_ICONV 1 2025-05-07T19:46:57.9889061Z #define _GLIBCXX_HAVE_INT64_T 1 2025-05-07T19:46:57.9889384Z #define _GLIBCXX_HAVE_INT64_T_LONG 1 2025-05-07T19:46:57.9889729Z #define _GLIBCXX_HAVE_INTTYPES_H 1 2025-05-07T19:46:57.9890033Z #define _GLIBCXX_HAVE_ISINF 1 2025-05-07T19:46:57.9890345Z #define _GLIBCXX_HAVE_ISINFF 1 2025-05-07T19:46:57.9890637Z #define _GLIBCXX_HAVE_ISINFL 1 2025-05-07T19:46:57.9890954Z #define _GLIBCXX_HAVE_ISNAN 1 2025-05-07T19:46:57.9891242Z #define _GLIBCXX_HAVE_ISNANF 1 2025-05-07T19:46:57.9891560Z #define _GLIBCXX_HAVE_ISNANL 1 2025-05-07T19:46:57.9891845Z #define _GLIBCXX_HAVE_ISWBLANK 1 2025-05-07T19:46:57.9892173Z #define _GLIBCXX_HAVE_LC_MESSAGES 1 2025-05-07T19:46:57.9892499Z #define _GLIBCXX_HAVE_LDEXPF 1 2025-05-07T19:46:57.9892823Z #define _GLIBCXX_HAVE_LDEXPL 1 2025-05-07T19:46:57.9893117Z #define _GLIBCXX_HAVE_LIMIT_AS 1 2025-05-07T19:46:57.9893460Z #define _GLIBCXX_HAVE_LIMIT_DATA 1 2025-05-07T19:46:57.9893772Z #define _GLIBCXX_HAVE_LIMIT_FSIZE 1 2025-05-07T19:46:57.9894115Z #define _GLIBCXX_HAVE_LIMIT_RSS 1 2025-05-07T19:46:57.9894457Z #define _GLIBCXX_HAVE_LIMIT_VMEM 0 2025-05-07T19:46:57.9894770Z #define _GLIBCXX_HAVE_LINK 1 2025-05-07T19:46:57.9895096Z #define _GLIBCXX_HAVE_LINUX_FUTEX 1 2025-05-07T19:46:57.9895423Z #define _GLIBCXX_HAVE_LINUX_RANDOM_H 1 2025-05-07T19:46:57.9895786Z #define _GLIBCXX_HAVE_LINUX_TYPES_H 1 2025-05-07T19:46:57.9896116Z #define _GLIBCXX_HAVE_LOCALE_H 1 2025-05-07T19:46:57.9896453Z #define _GLIBCXX_HAVE_LOG10F 1 2025-05-07T19:46:57.9897214Z #define _GLIBCXX_HAVE_LOG10L 1 2025-05-07T19:46:57.9897599Z #define _GLIBCXX_HAVE_LOGF 1 2025-05-07T19:46:57.9897894Z #define _GLIBCXX_HAVE_LOGL 1 2025-05-07T19:46:57.9898217Z #define _GLIBCXX_HAVE_MBSTATE_T 1 2025-05-07T19:46:57.9898642Z #define _GLIBCXX_HAVE_MEMALIGN 1 2025-05-07T19:46:57.9898948Z #define _GLIBCXX_HAVE_MEMORY_H 1 2025-05-07T19:46:57.9899280Z #define _GLIBCXX_HAVE_MODF 1 2025-05-07T19:46:57.9899574Z #define _GLIBCXX_HAVE_MODFF 1 2025-05-07T19:46:57.9899897Z #define _GLIBCXX_HAVE_MODFL 1 2025-05-07T19:46:57.9900186Z #define _GLIBCXX_HAVE_NETDB_H 1 2025-05-07T19:46:57.9900509Z #define _GLIBCXX_HAVE_NETINET_IN_H 1 2025-05-07T19:46:57.9900825Z #define _GLIBCXX_HAVE_NETINET_TCP_H 1 2025-05-07T19:46:57.9901169Z #define _GLIBCXX_HAVE_OBSOLETE_ISINF 1 2025-05-07T19:46:57.9901485Z #define _GLIBCXX_HAVE_OBSOLETE_ISNAN 1 2025-05-07T19:46:57.9901798Z #define _GLIBCXX_HAVE_POLL 1 2025-05-07T19:46:57.9902090Z #define _GLIBCXX_HAVE_POLL_H 1 2025-05-07T19:46:57.9902380Z #define _GLIBCXX_HAVE_POSIX_MEMALIGN 1 2025-05-07T19:46:57.9902732Z #define _GLIBCXX_HAVE_POSIX_SEMAPHORE 1 2025-05-07T19:46:57.9903176Z #define _GLIBCXX_HAVE_POWF 1 2025-05-07T19:46:57.9903588Z #define _GLIBCXX_HAVE_POWL 1 2025-05-07T19:46:57.9903866Z #define _GLIBCXX_HAVE_QUICK_EXIT 1 2025-05-07T19:46:57.9904172Z #define _GLIBCXX_HAVE_READLINK 1 2025-05-07T19:46:57.9904451Z #define _GLIBCXX_HAVE_SETENV 1 2025-05-07T19:46:57.9904743Z #define _GLIBCXX_HAVE_SINCOS 1 2025-05-07T19:46:57.9905046Z #define _GLIBCXX_HAVE_SINCOSF 1 2025-05-07T19:46:57.9905324Z #define _GLIBCXX_HAVE_SINCOSL 1 2025-05-07T19:46:57.9905627Z #define _GLIBCXX_HAVE_SINF 1 2025-05-07T19:46:57.9905901Z #define _GLIBCXX_HAVE_SINHF 1 2025-05-07T19:46:57.9906199Z #define _GLIBCXX_HAVE_SINHL 1 2025-05-07T19:46:57.9906472Z #define _GLIBCXX_HAVE_SINL 1 2025-05-07T19:46:57.9906752Z #define _GLIBCXX_HAVE_SOCKATMARK 1 2025-05-07T19:46:57.9907014Z #define _GLIBCXX_HAVE_SQRTF 1 2025-05-07T19:46:57.9907365Z #define _GLIBCXX_HAVE_SQRTL 1 2025-05-07T19:46:57.9907643Z #define _GLIBCXX_HAVE_STDALIGN_H 1 2025-05-07T19:46:57.9907954Z #define _GLIBCXX_HAVE_STDBOOL_H 1 2025-05-07T19:46:57.9908264Z #define _GLIBCXX_HAVE_STDINT_H 1 2025-05-07T19:46:57.9908544Z #define _GLIBCXX_HAVE_STDLIB_H 1 2025-05-07T19:46:57.9908859Z #define _GLIBCXX_HAVE_STRERROR_L 1 2025-05-07T19:46:57.9909153Z #define _GLIBCXX_HAVE_STRERROR_R 1 2025-05-07T19:46:57.9909459Z #define _GLIBCXX_HAVE_STRINGS_H 1 2025-05-07T19:46:57.9909745Z #define _GLIBCXX_HAVE_STRING_H 1 2025-05-07T19:46:57.9910049Z #define _GLIBCXX_HAVE_STRTOF 1 2025-05-07T19:46:57.9910324Z #define _GLIBCXX_HAVE_STRTOLD 1 2025-05-07T19:46:57.9910639Z #define _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE 1 2025-05-07T19:46:57.9910960Z #define _GLIBCXX_HAVE_STRXFRM_L 1 2025-05-07T19:46:57.9911262Z #define _GLIBCXX_HAVE_SYMLINK 1 2025-05-07T19:46:57.9911632Z #define _GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT 1 2025-05-07T19:46:57.9912018Z #define _GLIBCXX_HAVE_SYS_IOCTL_H 1 2025-05-07T19:46:57.9912329Z #define _GLIBCXX_HAVE_SYS_IPC_H 1 2025-05-07T19:46:57.9912616Z #define _GLIBCXX_HAVE_SYS_PARAM_H 1 2025-05-07T19:46:57.9912937Z #define _GLIBCXX_HAVE_SYS_RESOURCE_H 1 2025-05-07T19:46:57.9913235Z #define _GLIBCXX_HAVE_SYS_SEM_H 1 2025-05-07T19:46:57.9913543Z #define _GLIBCXX_HAVE_SYS_SOCKET_H 1 2025-05-07T19:46:57.9913846Z #define _GLIBCXX_HAVE_SYS_STATVFS_H 1 2025-05-07T19:46:57.9914169Z #define _GLIBCXX_HAVE_SYS_STAT_H 1 2025-05-07T19:46:57.9914461Z #define _GLIBCXX_HAVE_SYS_SYSINFO_H 1 2025-05-07T19:46:57.9914782Z #define _GLIBCXX_HAVE_SYS_TIME_H 1 2025-05-07T19:46:57.9915092Z #define _GLIBCXX_HAVE_SYS_TYPES_H 1 2025-05-07T19:46:57.9915381Z #define _GLIBCXX_HAVE_SYS_UIO_H 1 2025-05-07T19:46:57.9915686Z #define _GLIBCXX_HAVE_S_ISREG 1 2025-05-07T19:46:57.9915954Z #define _GLIBCXX_HAVE_TANF 1 2025-05-07T19:46:57.9916239Z #define _GLIBCXX_HAVE_TANHF 1 2025-05-07T19:46:57.9916505Z #define _GLIBCXX_HAVE_TANHL 1 2025-05-07T19:46:57.9916790Z #define _GLIBCXX_HAVE_TANL 1 2025-05-07T19:46:57.9917061Z #define _GLIBCXX_HAVE_TGMATH_H 1 2025-05-07T19:46:57.9917357Z #define _GLIBCXX_HAVE_TLS 1 2025-05-07T19:46:57.9917623Z #define _GLIBCXX_HAVE_TRUNCATE 1 2025-05-07T19:46:57.9917926Z #define _GLIBCXX_HAVE_UNISTD_H 1 2025-05-07T19:46:57.9918228Z #define _GLIBCXX_HAVE_USELOCALE 1 2025-05-07T19:46:57.9918584Z #define _GLIBCXX_HAVE_UTIME_H 1 2025-05-07T19:46:57.9918883Z #define _GLIBCXX_HAVE_VFWSCANF 1 2025-05-07T19:46:57.9919156Z #define _GLIBCXX_HAVE_VSWSCANF 1 2025-05-07T19:46:57.9919456Z #define _GLIBCXX_HAVE_VWSCANF 1 2025-05-07T19:46:57.9919729Z #define _GLIBCXX_HAVE_WCHAR_H 1 2025-05-07T19:46:57.9920022Z #define _GLIBCXX_HAVE_WCSTOF 1 2025-05-07T19:46:57.9920300Z #define _GLIBCXX_HAVE_WCTYPE_H 1 2025-05-07T19:46:57.9920605Z #define _GLIBCXX_HAVE_WRITEV 1 2025-05-07T19:46:57.9920892Z #define _GLIBCXX_HAVE_XLOCALE_H 1 2025-05-07T19:46:57.9921199Z #define _GLIBCXX_HOSTED 1 2025-05-07T19:46:57.9921476Z #define _GLIBCXX_ICONV_CONST 2025-05-07T19:46:57.9921734Z #define _GLIBCXX_INLINE_VERSION 0 2025-05-07T19:46:57.9922020Z #define _GLIBCXX_LT_OBJDIR ".libs/" 2025-05-07T19:46:57.9922606Z #define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) std::__make_move_if_noexcept_iterator(_Iter) 2025-05-07T19:46:57.9923436Z #define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter) 2025-05-07T19:46:57.9923961Z #define _GLIBCXX_MANGLE_SIZE_T m 2025-05-07T19:46:57.9924267Z #define _GLIBCXX_MATH_H 1 2025-05-07T19:46:57.9924557Z #define _GLIBCXX_MOVE(__val) std::move(__val) 2025-05-07T19:46:57.9924972Z #define _GLIBCXX_MOVE3(_Tp,_Up,_Vp) std::move(_Tp, _Up, _Vp) 2025-05-07T19:46:57.9925502Z #define _GLIBCXX_MOVE_BACKWARD3(_Tp,_Up,_Vp) std::move_backward(_Tp, _Up, _Vp) 2025-05-07T19:46:57.9925974Z #define _GLIBCXX_NAMESPACE_CXX11 __cxx11:: 2025-05-07T19:46:57.9926309Z #define _GLIBCXX_NAMESPACE_LDBL 2025-05-07T19:46:57.9926692Z #define _GLIBCXX_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_NAMESPACE_CXX11 2025-05-07T19:46:57.9927370Z #define _GLIBCXX_NATIVE_THREAD_ID (__gthread_active_p() ? __gthread_self() : (__gthread_t)1) 2025-05-07T19:46:57.9927906Z #define _GLIBCXX_NODISCARD [[__nodiscard__]] 2025-05-07T19:46:57.9928256Z #define _GLIBCXX_NOEXCEPT noexcept 2025-05-07T19:46:57.9928623Z #define _GLIBCXX_NOEXCEPT_IF(...) noexcept(__VA_ARGS__) 2025-05-07T19:46:57.9928995Z #define _GLIBCXX_NOEXCEPT_PARM , bool _NE 2025-05-07T19:46:57.9929362Z #define _GLIBCXX_NOEXCEPT_QUAL noexcept (_NE) 2025-05-07T19:46:57.9929747Z #define _GLIBCXX_NORETURN __attribute__ ((__noreturn__)) 2025-05-07T19:46:57.9930161Z #define _GLIBCXX_NOTHROW _GLIBCXX_USE_NOEXCEPT 2025-05-07T19:46:57.9930592Z #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23) 2025-05-07T19:46:57.9931023Z #define _GLIBCXX_NUMERIC_LIMITS 1 2025-05-07T19:46:57.9931300Z #define _GLIBCXX_OS_DEFINES 1 2025-05-07T19:46:57.9931590Z #define _GLIBCXX_PACKAGE_BUGREPORT "" 2025-05-07T19:46:57.9931917Z #define _GLIBCXX_PACKAGE_NAME "package-unused" 2025-05-07T19:46:57.9932319Z #define _GLIBCXX_PACKAGE_STRING "package-unused version-unused" 2025-05-07T19:46:57.9932728Z #define _GLIBCXX_PACKAGE_TARNAME "libstdc++" 2025-05-07T19:46:57.9933030Z #define _GLIBCXX_PACKAGE_URL "" 2025-05-07T19:46:57.9933386Z #define _GLIBCXX_PACKAGE__GLIBCXX_VERSION "version-unused" 2025-05-07T19:46:57.9933750Z #define _GLIBCXX_PREDEFINED_OPS_H 1 2025-05-07T19:46:57.9934045Z #define _GLIBCXX_PSEUDO_VISIBILITY(V) 2025-05-07T19:46:57.9934375Z #define _GLIBCXX_PURE __attribute__ ((__pure__)) 2025-05-07T19:46:57.9934689Z #define _GLIBCXX_RELEASE 11 2025-05-07T19:46:57.9934946Z #define _GLIBCXX_RES_LIMITS 1 2025-05-07T19:46:57.9935313Z #define _GLIBCXX_STDC_HEADERS 1 2025-05-07T19:46:57.9935685Z #define _GLIBCXX_STDIO_EOF -1 2025-05-07T19:46:57.9935921Z #define _GLIBCXX_STDIO_SEEK_CUR 1 2025-05-07T19:46:57.9936178Z #define _GLIBCXX_STDIO_SEEK_END 2 2025-05-07T19:46:57.9936420Z #define _GLIBCXX_STDLIB_H 1 2025-05-07T19:46:57.9936655Z #define _GLIBCXX_STD_A std 2025-05-07T19:46:57.9936876Z #define _GLIBCXX_STD_C std 2025-05-07T19:46:57.9937100Z #define _GLIBCXX_SYMVER 1 2025-05-07T19:46:57.9937323Z #define _GLIBCXX_SYMVER_GNU 1 2025-05-07T19:46:57.9937608Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(A) 2025-05-07T19:46:57.9937963Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(A) 2025-05-07T19:46:57.9938274Z #define _GLIBCXX_THROW(_EXC) 2025-05-07T19:46:57.9938566Z #define _GLIBCXX_THROW_OR_ABORT(_EXC) (throw (_EXC)) 2025-05-07T19:46:57.9938946Z #define _GLIBCXX_TR1_BESSEL_FUNCTION_TCC 1 2025-05-07T19:46:57.9939247Z #define _GLIBCXX_TR1_BETA_FUNCTION_TCC 1 2025-05-07T19:46:57.9939532Z #define _GLIBCXX_TR1_ELL_INTEGRAL_TCC 1 2025-05-07T19:46:57.9939826Z #define _GLIBCXX_TR1_EXP_INTEGRAL_TCC 1 2025-05-07T19:46:57.9940096Z #define _GLIBCXX_TR1_GAMMA_TCC 1 2025-05-07T19:46:57.9940403Z #define _GLIBCXX_TR1_HYPERGEOMETRIC_TCC 1 2025-05-07T19:46:57.9940765Z #define _GLIBCXX_TR1_LEGENDRE_FUNCTION_TCC 1 2025-05-07T19:46:57.9941107Z #define _GLIBCXX_TR1_MODIFIED_BESSEL_FUNC_TCC 1 2025-05-07T19:46:57.9941465Z #define _GLIBCXX_TR1_POLY_HERMITE_TCC 1 2025-05-07T19:46:57.9941779Z #define _GLIBCXX_TR1_POLY_LAGUERRE_TCC 1 2025-05-07T19:46:57.9942115Z #define _GLIBCXX_TR1_RIEMANN_ZETA_TCC 1 2025-05-07T19:46:57.9942439Z #define _GLIBCXX_TR1_SPECIAL_FUNCTION_UTIL_H 1 2025-05-07T19:46:57.9942782Z #define _GLIBCXX_TXN_SAFE 2025-05-07T19:46:57.9943046Z #define _GLIBCXX_TXN_SAFE_DYN 2025-05-07T19:46:57.9943345Z #define _GLIBCXX_TYPE_TRAITS 1 2025-05-07T19:46:57.9943627Z #define _GLIBCXX_USE_ALLOCATOR_NEW 1 2025-05-07T19:46:57.9943938Z #define _GLIBCXX_USE_C99 1 2025-05-07T19:46:57.9944280Z #define _GLIBCXX_USE_C99_COMPLEX _GLIBCXX11_USE_C99_COMPLEX 2025-05-07T19:46:57.9944650Z #define _GLIBCXX_USE_C99_COMPLEX_TR1 1 2025-05-07T19:46:57.9944976Z #define _GLIBCXX_USE_C99_CTYPE_TR1 1 2025-05-07T19:46:57.9945268Z #define _GLIBCXX_USE_C99_FENV_TR1 1 2025-05-07T19:46:57.9945582Z #define _GLIBCXX_USE_C99_INTTYPES_TR1 1 2025-05-07T19:46:57.9945907Z #define _GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 1 2025-05-07T19:46:57.9946303Z #define _GLIBCXX_USE_C99_MATH _GLIBCXX11_USE_C99_MATH 2025-05-07T19:46:57.9946706Z #define _GLIBCXX_USE_C99_MATH_TR1 1 2025-05-07T19:46:57.9947022Z #define _GLIBCXX_USE_C99_STDINT_TR1 1 2025-05-07T19:46:57.9947388Z #define _GLIBCXX_USE_C99_STDIO _GLIBCXX11_USE_C99_STDIO 2025-05-07T19:46:57.9947788Z #define _GLIBCXX_USE_C99_STDLIB _GLIBCXX11_USE_C99_STDLIB 2025-05-07T19:46:57.9948224Z #define _GLIBCXX_USE_C99_WCHAR _GLIBCXX11_USE_C99_WCHAR 2025-05-07T19:46:57.9948583Z #define _GLIBCXX_USE_CLOCK_MONOTONIC 1 2025-05-07T19:46:57.9948917Z #define _GLIBCXX_USE_CLOCK_REALTIME 1 2025-05-07T19:46:57.9949225Z #define _GLIBCXX_USE_CONSTEXPR constexpr 2025-05-07T19:46:57.9949559Z #define _GLIBCXX_USE_CXX11_ABI 1 2025-05-07T19:46:57.9950026Z #define _GLIBCXX_USE_DECIMAL_FLOAT 1 2025-05-07T19:46:57.9950351Z #define _GLIBCXX_USE_DEPRECATED 1 2025-05-07T19:46:57.9950671Z #define _GLIBCXX_USE_DEV_RANDOM 1 2025-05-07T19:46:57.9951030Z #define _GLIBCXX_USE_DUAL_ABI 1 2025-05-07T19:46:57.9951334Z #define _GLIBCXX_USE_FCHMOD 1 2025-05-07T19:46:57.9951611Z #define _GLIBCXX_USE_FCHMODAT 1 2025-05-07T19:46:57.9951923Z #define _GLIBCXX_USE_FLOAT128 1 2025-05-07T19:46:57.9952208Z #define _GLIBCXX_USE_GETTIMEOFDAY 1 2025-05-07T19:46:57.9952536Z #define _GLIBCXX_USE_GET_NPROCS 1 2025-05-07T19:46:57.9952828Z #define _GLIBCXX_USE_INT128 1 2025-05-07T19:46:57.9953125Z #define _GLIBCXX_USE_LFS 1 2025-05-07T19:46:57.9953403Z #define _GLIBCXX_USE_LONG_LONG 1 2025-05-07T19:46:57.9953716Z #define _GLIBCXX_USE_LSTAT 1 2025-05-07T19:46:57.9954025Z #define _GLIBCXX_USE_NANOSLEEP 1 2025-05-07T19:46:57.9954327Z #define _GLIBCXX_USE_NOEXCEPT noexcept 2025-05-07T19:46:57.9954684Z #define _GLIBCXX_USE_PTHREAD_RWLOCK_T 1 2025-05-07T19:46:57.9955008Z #define _GLIBCXX_USE_RANDOM_TR1 1 2025-05-07T19:46:57.9955333Z #define _GLIBCXX_USE_REALPATH 1 2025-05-07T19:46:57.9955622Z #define _GLIBCXX_USE_SCHED_YIELD 1 2025-05-07T19:46:57.9955963Z #define _GLIBCXX_USE_SC_NPROCESSORS_ONLN 1 2025-05-07T19:46:57.9956287Z #define _GLIBCXX_USE_SENDFILE 1 2025-05-07T19:46:57.9956601Z #define _GLIBCXX_USE_STD_SPEC_FUNCS 1 2025-05-07T19:46:57.9956913Z #define _GLIBCXX_USE_ST_MTIM 1 2025-05-07T19:46:57.9957308Z #define _GLIBCXX_USE_TBB_PAR_BACKEND __has_include() 2025-05-07T19:46:57.9957730Z #define _GLIBCXX_USE_TMPNAM 1 2025-05-07T19:46:57.9958011Z #define _GLIBCXX_USE_UTIME 1 2025-05-07T19:46:57.9958334Z #define _GLIBCXX_USE_UTIMENSAT 1 2025-05-07T19:46:57.9958628Z #define _GLIBCXX_USE_WCHAR_T 1 2025-05-07T19:46:57.9959039Z #define _GLIBCXX_USE_WEAK_REF __GXX_WEAK__ 2025-05-07T19:46:57.9959365Z #define _GLIBCXX_UTILITY 1 2025-05-07T19:46:57.9959668Z #define _GLIBCXX_VERBOSE 1 2025-05-07T19:46:57.9960047Z #define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V))) 2025-05-07T19:46:57.9960508Z #define _GLIBCXX_WEAK_DEFINITION 2025-05-07T19:46:57.9960836Z #define _GLIBCXX_X86_RDRAND 1 2025-05-07T19:46:57.9961116Z #define _GLIBCXX_X86_RDSEED 1 2025-05-07T19:46:57.9961426Z #define _GNU_SOURCE 1 2025-05-07T19:46:57.9961702Z #define _GTHREAD_USE_MUTEX_TIMEDLOCK 1 2025-05-07T19:46:57.9962045Z #define _G_BUFSIZ 8192 2025-05-07T19:46:57.9962308Z #define _G_HAVE_MMAP 1 2025-05-07T19:46:57.9962701Z #define _G_HAVE_MREMAP 1 2025-05-07T19:46:57.9963207Z #define _G_HAVE_ST_BLKSIZE defined (_STATBUF_ST_BLKSIZE) 2025-05-07T19:46:57.9963644Z #define _G_IO_IO_FILE_VERSION 0x20001 2025-05-07T19:46:57.9963955Z #define _G_config_h 1 2025-05-07T19:46:57.9964257Z #define _G_va_list __gnuc_va_list 2025-05-07T19:46:57.9964562Z #define _INITIALIZER_LIST 2025-05-07T19:46:57.9964867Z #define _IOFBF 0 2025-05-07T19:46:57.9965134Z #define _IOLBF 1 2025-05-07T19:46:57.9965376Z #define _IONBF 2 2025-05-07T19:46:57.9965650Z #define _IOS_APPEND 8 2025-05-07T19:46:57.9965903Z #define _IOS_ATEND 4 2025-05-07T19:46:57.9966170Z #define _IOS_BIN 128 2025-05-07T19:46:57.9966411Z #define _IOS_INPUT 1 2025-05-07T19:46:57.9966680Z #define _IOS_NOCREATE 32 2025-05-07T19:46:57.9966947Z #define _IOS_NOREPLACE 64 2025-05-07T19:46:57.9967398Z #define _IOS_OUTPUT 2 2025-05-07T19:46:57.9967643Z #define _IOS_TRUNC 16 2025-05-07T19:46:57.9967926Z #define _IO_BAD_SEEN 0x4000 2025-05-07T19:46:57.9968441Z #define _IO_BE(expr,res) __builtin_expect ((expr), res) 2025-05-07T19:46:57.9968843Z #define _IO_BOOLALPHA 0200000 2025-05-07T19:46:57.9969162Z #define _IO_BUFSIZ _G_BUFSIZ 2025-05-07T19:46:57.9969451Z #define _IO_CURRENTLY_PUTTING 0x800 2025-05-07T19:46:57.9969778Z #define _IO_DEC 020 2025-05-07T19:46:57.9970033Z #define _IO_DELETE_DONT_CLOSE 0x40 2025-05-07T19:46:57.9970366Z #define _IO_DONT_CLOSE 0100000 2025-05-07T19:46:57.9970652Z #define _IO_EOF_SEEN 0x10 2025-05-07T19:46:57.9970950Z #define _IO_ERR_SEEN 0x20 2025-05-07T19:46:57.9971227Z #define _IO_FIXED 010000 2025-05-07T19:46:57.9971518Z #define _IO_FLAGS2_MMAP 1 2025-05-07T19:46:57.9971798Z #define _IO_FLAGS2_NOTCANCEL 2 2025-05-07T19:46:57.9972110Z #define _IO_FLAGS2_USER_WBUF 8 2025-05-07T19:46:57.9972452Z #define _IO_HAVE_ST_BLKSIZE _G_HAVE_ST_BLKSIZE 2025-05-07T19:46:57.9972782Z #define _IO_HEX 0100 2025-05-07T19:46:57.9973057Z #define _IO_INTERNAL 010 2025-05-07T19:46:57.9973327Z #define _IO_IN_BACKUP 0x100 2025-05-07T19:46:57.9973642Z #define _IO_IS_APPENDING 0x1000 2025-05-07T19:46:57.9973938Z #define _IO_IS_FILEBUF 0x2000 2025-05-07T19:46:57.9974244Z #define _IO_LEFT 02 2025-05-07T19:46:57.9974489Z #define _IO_LINE_BUF 0x200 2025-05-07T19:46:57.9974784Z #define _IO_LINKED 0x80 2025-05-07T19:46:57.9975054Z #define _IO_MAGIC 0xFBAD0000 2025-05-07T19:46:57.9975530Z #define _IO_MAGIC_MASK 0xFFFF0000 2025-05-07T19:46:57.9975851Z #define _IO_NO_READS 4 2025-05-07T19:46:57.9976107Z #define _IO_NO_WRITES 8 2025-05-07T19:46:57.9976380Z #define _IO_OCT 040 2025-05-07T19:46:57.9976776Z #define _IO_PENDING_OUTPUT_COUNT(_fp) ((_fp)->_IO_write_ptr - (_fp)->_IO_write_base) 2025-05-07T19:46:57.9977264Z #define _IO_RIGHT 04 2025-05-07T19:46:57.9977521Z #define _IO_SCIENTIFIC 04000 2025-05-07T19:46:57.9977822Z #define _IO_SHOWBASE 0200 2025-05-07T19:46:57.9978093Z #define _IO_SHOWPOINT 0400 2025-05-07T19:46:57.9978392Z #define _IO_SHOWPOS 02000 2025-05-07T19:46:57.9978771Z #define _IO_SKIPWS 01 2025-05-07T19:46:57.9979160Z #define _IO_STDIO 040000 2025-05-07T19:46:57.9979432Z #define _IO_STDIO_H 2025-05-07T19:46:57.9979671Z #define _IO_TIED_PUT_GET 0x400 2025-05-07T19:46:57.9979966Z #define _IO_UNBUFFERED 2 2025-05-07T19:46:57.9980226Z #define _IO_UNIFIED_JUMPTABLES 1 2025-05-07T19:46:57.9980528Z #define _IO_UNITBUF 020000 2025-05-07T19:46:57.9980785Z #define _IO_UPPERCASE 01000 2025-05-07T19:46:57.9981164Z #define _IO_USER_BUF 1 2025-05-07T19:46:57.9981408Z #define _IO_USER_LOCK 0x8000 2025-05-07T19:46:57.9981707Z #define _IO_cleanup_region_end(_Doit) 2025-05-07T19:46:57.9982021Z #define _IO_cleanup_region_start(_fct,_fp) 2025-05-07T19:46:57.9982639Z #define _IO_feof_unlocked(__fp) (((__fp)->_flags & _IO_EOF_SEEN) != 0) 2025-05-07T19:46:57.9983171Z #define _IO_ferror_unlocked(__fp) (((__fp)->_flags & _IO_ERR_SEEN) != 0) 2025-05-07T19:46:57.9983585Z #define _IO_file_flags _flags 2025-05-07T19:46:57.9983895Z #define _IO_flockfile(_fp) 2025-05-07T19:46:57.9984173Z #define _IO_fpos64_t _G_fpos64_t 2025-05-07T19:46:57.9984481Z #define _IO_fpos_t _G_fpos_t 2025-05-07T19:46:57.9984767Z #define _IO_ftrylockfile(_fp) 2025-05-07T19:46:57.9985081Z #define _IO_funlockfile(_fp) 2025-05-07T19:46:57.9985640Z #define _IO_getc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++) 2025-05-07T19:46:57.9986244Z #define _IO_iconv_t _G_iconv_t 2025-05-07T19:46:57.9986549Z #define _IO_off64_t __off64_t 2025-05-07T19:46:57.9986827Z #define _IO_off_t __off_t 2025-05-07T19:46:57.9987147Z #define _IO_peekc(_fp) _IO_peekc_unlocked (_fp) 2025-05-07T19:46:57.9987805Z #define _IO_peekc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) && __underflow (_fp) == EOF ? EOF : *(unsigned char *) (_fp)->_IO_read_ptr) 2025-05-07T19:46:57.9988464Z #define _IO_pid_t __pid_t 2025-05-07T19:46:57.9989117Z #define _IO_putc_unlocked(_ch,_fp) (_IO_BE ((_fp)->_IO_write_ptr >= (_fp)->_IO_write_end, 0) ? __overflow (_fp, (unsigned char) (_ch)) : (unsigned char) (*(_fp)->_IO_write_ptr++ = (_ch))) 2025-05-07T19:46:57.9989898Z #define _IO_size_t size_t 2025-05-07T19:46:57.9990188Z #define _IO_ssize_t __ssize_t 2025-05-07T19:46:57.9990507Z #define _IO_stderr ((_IO_FILE*)(&_IO_2_1_stderr_)) 2025-05-07T19:46:57.9990896Z #define _IO_stdin ((_IO_FILE*)(&_IO_2_1_stdin_)) 2025-05-07T19:46:57.9991261Z #define _IO_stdout ((_IO_FILE*)(&_IO_2_1_stdout_)) 2025-05-07T19:46:57.9991628Z #define _IO_uid_t __uid_t 2025-05-07T19:46:57.9991898Z #define _IO_va_list __gnuc_va_list 2025-05-07T19:46:57.9992211Z #define _IO_wint_t wint_t 2025-05-07T19:46:57.9992469Z #define _ISOC11_SOURCE 1 2025-05-07T19:46:57.9992759Z #define _ISOC95_SOURCE 1 2025-05-07T19:46:57.9993044Z #define _ISOC99_SOURCE 1 2025-05-07T19:46:57.9993393Z #define _ISbit(bit) ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8)) 2025-05-07T19:46:57.9993828Z #define _LARGEFILE64_SOURCE 1 2025-05-07T19:46:57.9994229Z #define _LARGEFILE_SOURCE 1 2025-05-07T19:46:57.9994522Z #define _LIBC_LIMITS_H_ 1 2025-05-07T19:46:57.9994778Z #define _LINUX_LIMITS_H 2025-05-07T19:46:57.9995063Z #define _LP64 1 2025-05-07T19:46:57.9995288Z #define _MATH_H 1 2025-05-07T19:46:57.9995552Z #define _MATH_H_MATHDEF 1 2025-05-07T19:46:57.9995801Z #define _MOVE_H 1 2025-05-07T19:46:57.9996062Z #define _Mfloat_ float 2025-05-07T19:46:57.9996326Z #define _Mlong_double_ long double 2025-05-07T19:46:57.9996638Z #define _NEW 2025-05-07T19:46:57.9996899Z #define _OLD_STDIO_MAGIC 0xFABC0000 2025-05-07T19:46:57.9997200Z #define _POSIX2_BC_BASE_MAX 99 2025-05-07T19:46:57.9997506Z #define _POSIX2_BC_DIM_MAX 2048 2025-05-07T19:46:57.9997779Z #define _POSIX2_BC_SCALE_MAX 99 2025-05-07T19:46:57.9998082Z #define _POSIX2_BC_STRING_MAX 1000 2025-05-07T19:46:57.9998368Z #define _POSIX2_CHARCLASS_NAME_MAX 14 2025-05-07T19:46:57.9998690Z #define _POSIX2_COLL_WEIGHTS_MAX 2 2025-05-07T19:46:57.9998975Z #define _POSIX2_EXPR_NEST_MAX 32 2025-05-07T19:46:57.9999277Z #define _POSIX2_LINE_MAX 2048 2025-05-07T19:46:57.9999547Z #define _POSIX2_RE_DUP_MAX 255 2025-05-07T19:46:57.9999840Z #define _POSIX_AIO_LISTIO_MAX 2 2025-05-07T19:46:58.0000130Z #define _POSIX_AIO_MAX 1 2025-05-07T19:46:58.0000380Z #define _POSIX_ARG_MAX 4096 2025-05-07T19:46:58.0000666Z #define _POSIX_CHILD_MAX 25 2025-05-07T19:46:58.0000930Z #define _POSIX_CLOCKRES_MIN 20000000 2025-05-07T19:46:58.0001242Z #define _POSIX_C_SOURCE 200809L 2025-05-07T19:46:58.0001517Z #define _POSIX_DELAYTIMER_MAX 32 2025-05-07T19:46:58.0001913Z #define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX 2025-05-07T19:46:58.0002228Z #define _POSIX_HIWAT _POSIX_PIPE_BUF 2025-05-07T19:46:58.0002635Z #define _POSIX_HOST_NAME_MAX 255 2025-05-07T19:46:58.0003085Z #define _POSIX_LINK_MAX 8 2025-05-07T19:46:58.0003385Z #define _POSIX_LOGIN_NAME_MAX 9 2025-05-07T19:46:58.0003706Z #define _POSIX_MAX_CANON 255 2025-05-07T19:46:58.0003992Z #define _POSIX_MAX_INPUT 255 2025-05-07T19:46:58.0004307Z #define _POSIX_MQ_OPEN_MAX 8 2025-05-07T19:46:58.0004604Z #define _POSIX_MQ_PRIO_MAX 32 2025-05-07T19:46:58.0004924Z #define _POSIX_NAME_MAX 14 2025-05-07T19:46:58.0005201Z #define _POSIX_NGROUPS_MAX 8 2025-05-07T19:46:58.0005515Z #define _POSIX_OPEN_MAX 20 2025-05-07T19:46:58.0005786Z #define _POSIX_PATH_MAX 256 2025-05-07T19:46:58.0006089Z #define _POSIX_PIPE_BUF 512 2025-05-07T19:46:58.0006363Z #define _POSIX_QLIMIT 1 2025-05-07T19:46:58.0006651Z #define _POSIX_RE_DUP_MAX 255 2025-05-07T19:46:58.0006967Z #define _POSIX_RTSIG_MAX 8 2025-05-07T19:46:58.0007248Z #define _POSIX_SEM_NSEMS_MAX 256 2025-05-07T19:46:58.0007575Z #define _POSIX_SEM_VALUE_MAX 32767 2025-05-07T19:46:58.0007885Z #define _POSIX_SIGQUEUE_MAX 32 2025-05-07T19:46:58.0008191Z #define _POSIX_SOURCE 1 2025-05-07T19:46:58.0008458Z #define _POSIX_SSIZE_MAX 32767 2025-05-07T19:46:58.0008770Z #define _POSIX_STREAM_MAX 8 2025-05-07T19:46:58.0009049Z #define _POSIX_SYMLINK_MAX 255 2025-05-07T19:46:58.0009364Z #define _POSIX_SYMLOOP_MAX 8 2025-05-07T19:46:58.0009680Z #define _POSIX_THREAD_DESTRUCTOR_ITERATIONS 4 2025-05-07T19:46:58.0010059Z #define _POSIX_THREAD_KEYS_MAX 128 2025-05-07T19:46:58.0010391Z #define _POSIX_THREAD_THREADS_MAX 64 2025-05-07T19:46:58.0010768Z #define _POSIX_TIMER_MAX 32 2025-05-07T19:46:58.0011081Z #define _POSIX_TTY_NAME_MAX 9 2025-05-07T19:46:58.0011369Z #define _POSIX_TZNAME_MAX 6 2025-05-07T19:46:58.0011679Z #define _POSIX_UIO_MAXIOV 16 2025-05-07T19:46:58.0012039Z #define _PSTL_ASSERT(_Condition) __glibcxx_assert(_Condition) 2025-05-07T19:46:58.0012592Z #define _PSTL_ASSERT_MSG(_Condition,_Message) __glibcxx_assert(_Condition) 2025-05-07T19:46:58.0013239Z #define _PSTL_CLANG_VERSION (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__) 2025-05-07T19:46:58.0013781Z #define _PSTL_CONFIG_H 2025-05-07T19:46:58.0014300Z #define _PSTL_CPP11_STD_ROTATE_BROKEN ((__GLIBCXX__ && __GLIBCXX__ < 20150716) || (_MSC_VER && _MSC_VER < 1800)) 2025-05-07T19:46:58.0015304Z #define _PSTL_CPP14_2RANGE_MISMATCH_EQUAL_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201300L || __cpp_lib_robust_nonmodifying_seq_ops == 201304) 2025-05-07T19:46:58.0016148Z #define _PSTL_CPP14_INTEGER_SEQUENCE_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L) 2025-05-07T19:46:58.0016951Z #define _PSTL_CPP14_MAKE_REVERSE_ITERATOR_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L || __cpp_lib_make_reverse_iterator == 201402) 2025-05-07T19:46:58.0017962Z #define _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT (!__INTEL_COMPILER || __INTEL_COMPILER >= 1700) && (_MSC_FULL_VER >= 190023918 || __cplusplus >= 201402L) 2025-05-07T19:46:58.0018751Z #define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT (_MSC_VER >= 1912) 2025-05-07T19:46:58.0019216Z #define _PSTL_EARLYEXIT_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:58.0019754Z #define _PSTL_GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:58.0020219Z #define _PSTL_HIDE_FROM_ABI_POP 2025-05-07T19:46:58.0020536Z #define _PSTL_HIDE_FROM_ABI_PUSH 2025-05-07T19:46:58.0020906Z #define _PSTL_ICC_18_OMP_SIMD_BROKEN (__INTEL_COMPILER == 1800) 2025-05-07T19:46:58.0021377Z #define _PSTL_MONOTONIC_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:58.0021779Z #define _PSTL_PAR_BACKEND_SERIAL 2025-05-07T19:46:58.0022086Z #define _PSTL_PRAGMA(x) _Pragma(# x) 2025-05-07T19:46:58.0022873Z #define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME,OP) _PSTL_PRAGMA(omp declare reduction(NAME:OP : omp_out(omp_in)) initializer(omp_priv = omp_orig)) 2025-05-07T19:46:58.0023595Z #define _PSTL_PRAGMA_DECLARE_SIMD _PSTL_PRAGMA(omp declare simd) 2025-05-07T19:46:58.0024098Z #define _PSTL_PRAGMA_FORCEINLINE 2025-05-07T19:46:58.0024440Z #define _PSTL_PRAGMA_LOCATION " [Parallel STL message]: " 2025-05-07T19:46:58.0024827Z #define _PSTL_PRAGMA_MESSAGE(x) 2025-05-07T19:46:58.0025351Z #define _PSTL_PRAGMA_MESSAGE_IMPL(x) _PSTL_PRAGMA(message(_PSTL_STRING_CONCAT(_PSTL_PRAGMA_LOCATION, x))) 2025-05-07T19:46:58.0025889Z #define _PSTL_PRAGMA_MESSAGE_POLICIES(x) 2025-05-07T19:46:58.0026259Z #define _PSTL_PRAGMA_SIMD _PSTL_PRAGMA(omp simd) 2025-05-07T19:46:58.0026605Z #define _PSTL_PRAGMA_SIMD_EARLYEXIT 2025-05-07T19:46:58.0026955Z #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) 2025-05-07T19:46:58.0027304Z #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) 2025-05-07T19:46:58.0027686Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC(PRM) 2025-05-07T19:46:58.0028095Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC_2ARGS(PRM1,PRM2) 2025-05-07T19:46:58.0028615Z #define _PSTL_PRAGMA_SIMD_REDUCTION(PRM) _PSTL_PRAGMA(omp simd reduction(PRM)) 2025-05-07T19:46:58.0029089Z #define _PSTL_PRAGMA_SIMD_SCAN(PRM) 2025-05-07T19:46:58.0029397Z #define _PSTL_PRAGMA_VECTOR_UNALIGNED 2025-05-07T19:46:58.0029734Z #define _PSTL_STRING(x) _PSTL_STRING_AUX(x) 2025-05-07T19:46:58.0030046Z #define _PSTL_STRING_AUX(x) #x 2025-05-07T19:46:58.0030360Z #define _PSTL_STRING_CONCAT(x,y) x #y 2025-05-07T19:46:58.0030662Z #define _PSTL_UDR_PRESENT 0 2025-05-07T19:46:58.0031129Z #define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626) 2025-05-07T19:46:58.0031613Z #define _PSTL_USAGE_WARNINGS 0 2025-05-07T19:46:58.0031952Z #define _PSTL_USE_NONTEMPORAL_STORES_IF_ALLOWED 2025-05-07T19:46:58.0032317Z #define _PSTL_VERSION 12000 2025-05-07T19:46:58.0032676Z #define _PSTL_VERSION_MAJOR (_PSTL_VERSION / 1000) 2025-05-07T19:46:58.0033105Z #define _PSTL_VERSION_MINOR ((_PSTL_VERSION % 1000) / 10) 2025-05-07T19:46:58.0033497Z #define _PSTL_VERSION_PATCH (_PSTL_VERSION % 10) 2025-05-07T19:46:58.0033852Z #define _PTRDIFF_T 2025-05-07T19:46:58.0034087Z #define _PTR_TRAITS_H 1 2025-05-07T19:46:58.0034374Z #define _SIGSET_H_types 1 2025-05-07T19:46:58.0034716Z #define _SIGSET_NWORDS (1024 / (8 * sizeof (unsigned long int))) 2025-05-07T19:46:58.0035118Z #define _SIZE_T 2025-05-07T19:46:58.0035385Z #define _STDC_PREDEF_H 1 2025-05-07T19:46:58.0035630Z #define _STDIO_H 1 2025-05-07T19:46:58.0035895Z #define _STDIO_USES_IOSTREAM 2025-05-07T19:46:58.0036153Z #define _STDLIB_H 1 2025-05-07T19:46:58.0036410Z #define _STL_ALGOBASE_H 1 2025-05-07T19:46:58.0036675Z #define _STL_ITERATOR_BASE_FUNCS_H 1 2025-05-07T19:46:58.0037003Z #define _STL_ITERATOR_BASE_TYPES_H 1 2025-05-07T19:46:58.0037292Z #define _STL_ITERATOR_H 1 2025-05-07T19:46:58.0037563Z #define _STL_PAIR_H 1 2025-05-07T19:46:58.0037802Z #define _STL_RELOPS_H 1 2025-05-07T19:46:58.0038064Z #define _STRING_H 1 2025-05-07T19:46:58.0038291Z #define _STRUCT_TIMEVAL 1 2025-05-07T19:46:58.0038556Z #define _SVID_SOURCE 1 2025-05-07T19:46:58.0038812Z #define _SYS_CDEFS_H 1 2025-05-07T19:46:58.0039045Z #define _SYS_SELECT_H 1 2025-05-07T19:46:58.0039311Z #define _SYS_SYSMACROS_H 1 2025-05-07T19:46:58.0039558Z #define _SYS_TYPES_H 1 2025-05-07T19:46:58.0039813Z #define _TIME_H 1 2025-05-07T19:46:58.0040039Z #define _VA_LIST_DEFINED 2025-05-07T19:46:58.0040305Z #define _XLOCALE_H 1 2025-05-07T19:46:58.0040555Z #define _XOPEN_IOV_MAX _POSIX_UIO_MAXIOV 2025-05-07T19:46:58.0040873Z #define _XOPEN_LIM_H 1 2025-05-07T19:46:58.0041115Z #define _XOPEN_SOURCE 700 2025-05-07T19:46:58.0041406Z #define _XOPEN_SOURCE_EXTENDED 1 2025-05-07T19:46:58.0041795Z #define __ASMNAME(cname) __ASMNAME2 (__USER_LABEL_PREFIX__, cname) 2025-05-07T19:46:58.0042243Z #define __ASMNAME2(prefix,cname) __STRING (prefix) cname 2025-05-07T19:46:58.0042743Z #define __ASSERT_FUNCTION __PRETTY_FUNCTION__ 2025-05-07T19:46:58.0043279Z #define __ASSERT_VOID_CAST static_cast 2025-05-07T19:46:58.0043659Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:46:58.0043934Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:46:58.0044228Z #define __ATOMIC_CONSUME 1 2025-05-07T19:46:58.0044494Z #define __ATOMIC_RELAXED 0 2025-05-07T19:46:58.0044901Z #define __ATOMIC_RELEASE 3 2025-05-07T19:46:58.0045171Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:46:58.0045486Z #define __BEGIN_DECLS extern "C" { 2025-05-07T19:46:58.0045832Z #define __BEGIN_NAMESPACE_C99 2025-05-07T19:46:58.0046129Z #define __BEGIN_NAMESPACE_STD 2025-05-07T19:46:58.0046461Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:46:58.0046756Z #define __BIG_ENDIAN 4321 2025-05-07T19:46:58.0047057Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:46:58.0047368Z #define __BIT_TYPES_DEFINED__ 1 2025-05-07T19:46:58.0047687Z #define __BLKCNT64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:58.0048022Z #define __BLKCNT_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0048416Z #define __BLKSIZE_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0048777Z #define __BOOL_WIDTH__ 8 2025-05-07T19:46:58.0049050Z #define __BYTE_ORDER __LITTLE_ENDIAN 2025-05-07T19:46:58.0049407Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:58.0049746Z #define __CHANNEL_DESCRIPTOR_H__ 2025-05-07T19:46:58.0050082Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:46:58.0050396Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:46:58.0050715Z #define __CHAR_BIT__ 8 2025-05-07T19:46:58.0050983Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:58.0051350Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:58.0051690Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:58.0052052Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:58.0052403Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:58.0052724Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:58.0053081Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:58.0053482Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:58.0053848Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:58.0054183Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:58.0054528Z #define __CLANG_LIMITS_H 2025-05-07T19:46:58.0054803Z #define __CLANG_MAX_ALIGN_T_DEFINED 2025-05-07T19:46:58.0055146Z #define __CLOCKID_T_TYPE __S32_TYPE 2025-05-07T19:46:58.0055591Z #define __CLOCK_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0055896Z #define __COMMON_FUNCTIONS_H__ 2025-05-07T19:46:58.0056185Z #define __COMPAR_FN_T 2025-05-07T19:46:58.0056427Z #define __CONCAT(x,y) x ## y 2025-05-07T19:46:58.0056714Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:46:58.0057008Z #define __CUDACC_DEVICE_ATOMIC_BUILTINS__ 1 2025-05-07T19:46:58.0057337Z #define __CUDACC_VER_BUILD__ 61 2025-05-07T19:46:58.0057605Z #define __CUDACC_VER_MAJOR__ 12 2025-05-07T19:46:58.0057899Z #define __CUDACC_VER_MINOR__ 8 2025-05-07T19:46:58.0058492Z #define __CUDACC_VER__ "__CUDACC_VER__ is no longer supported. Use __CUDACC_VER_MAJOR__, __CUDACC_VER_MINOR__, and __CUDACC_VER_BUILD__ instead." 2025-05-07T19:46:58.0059129Z #define __CUDACC__ 1 2025-05-07T19:46:58.0059411Z #define __CUDART_API_PTDS(api) api 2025-05-07T19:46:58.0059704Z #define __CUDART_API_PTSZ(api) api 2025-05-07T19:46:58.0060179Z #define __CUDART_API_VERSION ((__CUDA_API_VER_MAJOR__ * 1000) + (__CUDA_API_VER_MINOR__ * 10)) 2025-05-07T19:46:58.0060646Z #define __CUDA_API_VER_MAJOR__ 12 2025-05-07T19:46:58.0060955Z #define __CUDA_API_VER_MINOR__ 8 2025-05-07T19:46:58.0061307Z #define __CUDA_ARCH_HAS_FEATURE__(_FEAT) __CUDA_ARCH_FEAT_##_FEAT 2025-05-07T19:46:58.0061705Z #define __CUDA_ARCH_LIST__ 520 2025-05-07T19:46:58.0061967Z #define __CUDA_ARCH__ 520 2025-05-07T19:46:58.0062258Z #define __CUDA_DEVICE_RUNTIME_API_H__ 2025-05-07T19:46:58.0062582Z #define __CUDA_MATH_CRTIMP 2025-05-07T19:46:58.0062846Z #define __CUDA_RUNTIME_API_H__ 2025-05-07T19:46:58.0063146Z #define __CUDA_RUNTIME_H__ 2025-05-07T19:46:58.0063415Z #define __DADDR_T_TYPE __S32_TYPE 2025-05-07T19:46:58.0063724Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:46:58.0064025Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:46:58.0064373Z #define __DBL_DIG__ 15 2025-05-07T19:46:58.0064638Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:46:58.0064982Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:46:58.0065321Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:46:58.0065624Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:58.0065925Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:46:58.0066195Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:46:58.0066501Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:46:58.0066779Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:46:58.0067274Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:46:58.0067751Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:46:58.0068161Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:46:58.0068505Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:46:58.0068872Z #define __DELETE_THROW throw() 2025-05-07T19:46:58.0069180Z #define __DEPRECATED 1 2025-05-07T19:46:58.0069459Z #define __DEVICE_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0069819Z #define __DEVICE_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:58.0070149Z #define __DEVICE_DOUBLE_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0070516Z #define __DEVICE_DOUBLE_FUNCTIONS_H__ 2025-05-07T19:46:58.0070838Z #define __DEVICE_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0071171Z #define __DEVICE_FUNCTIONS_H__ 2025-05-07T19:46:58.0071467Z #define __DEVICE_LAUNCH_PARAMETERS_H__ 2025-05-07T19:46:58.0071808Z #define __DEVICE_TYPES_H__ 2025-05-07T19:46:58.0072091Z #define __DEV_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:58.0072413Z #define __DRIVER_FUNCTIONS_H__ 2025-05-07T19:46:58.0072723Z #define __DRIVER_TYPES_H__ 2025-05-07T19:46:58.0072990Z #define __ELF__ 1 2025-05-07T19:46:58.0073250Z #define __END_DECLS } 2025-05-07T19:46:58.0073504Z #define __END_NAMESPACE_C99 2025-05-07T19:46:58.0073807Z #define __END_NAMESPACE_STD 2025-05-07T19:46:58.0074200Z #define __EXCEPTIONS 1 2025-05-07T19:46:58.0094226Z #define __EXCEPTION_H 1 2025-05-07T19:46:58.0094576Z #define __FDS_BITS(set) ((set)->fds_bits) 2025-05-07T19:46:58.0095019Z #define __FD_CLR(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] &= ~__FD_MASK (d))) 2025-05-07T19:46:58.0095556Z #define __FD_ELT(d) ((d) / __NFDBITS) 2025-05-07T19:46:58.0095935Z #define __FD_ISSET(d,set) ((__FDS_BITS (set)[__FD_ELT (d)] & __FD_MASK (d)) != 0) 2025-05-07T19:46:58.0096376Z #define __FD_MASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) 2025-05-07T19:46:58.0096804Z #define __FD_SET(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] |= __FD_MASK (d))) 2025-05-07T19:46:58.0097187Z #define __FD_SETSIZE 1024 2025-05-07T19:46:58.0097829Z #define __FD_ZERO(fdsp) do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " __FD_ZERO_STOS : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&__FDS_BITS (fdsp)[0]) : "memory"); } while (0) 2025-05-07T19:46:58.0098515Z #define __FD_ZERO_STOS "stosq" 2025-05-07T19:46:58.0098785Z #define __FILE_defined 1 2025-05-07T19:46:58.0099024Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:46:58.0099266Z #define __FLOAT128__ 1 2025-05-07T19:46:58.0099513Z #define __FLOAT_WORD_ORDER __BYTE_ORDER 2025-05-07T19:46:58.0099783Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:46:58.0100077Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:46:58.0100390Z #define __FLT16_DIG__ 3 2025-05-07T19:46:58.0100617Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:46:58.0100878Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:46:58.0101121Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:46:58.0101360Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:46:58.0101610Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:46:58.0101847Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:46:58.0102095Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:46:58.0102332Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:46:58.0102601Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:46:58.0102865Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:46:58.0103319Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:46:58.0103616Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:46:58.0103881Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:46:58.0104193Z #define __FLT_DIG__ 6 2025-05-07T19:46:58.0104413Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:46:58.0104908Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:46:58.0105161Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:46:58.0105435Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:46:58.0105682Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:46:58.0105948Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:46:58.0106198Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:46:58.0106462Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:46:58.0106735Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:46:58.0106988Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:46:58.0107250Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:46:58.0107505Z #define __FLT_RADIX__ 2 2025-05-07T19:46:58.0107767Z #define __FSBLKCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:58.0108087Z #define __FSBLKCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:58.0108413Z #define __FSFILCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:58.0108726Z #define __FSFILCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:58.0109069Z #define __FSID_T_TYPE struct { int __val[2]; } 2025-05-07T19:46:58.0109390Z #define __FSWORD_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0109692Z #define __FXSR__ 1 2025-05-07T19:46:58.0109918Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:46:58.0110200Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:58.0110493Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:58.0110801Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:58.0111108Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:58.0111404Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:58.0111692Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:58.0111979Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:58.0112351Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:58.0112657Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:58.0112957Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:46:58.0113276Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:58.0113570Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:46:58.0113875Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:46:58.0114192Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:46:58.0114518Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:46:58.0114836Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:46:58.0115253Z #define __GID_T_TYPE __U32_TYPE 2025-05-07T19:46:58.0115515Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:46:58.0115787Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:46:58.0116070Z #define __GLIBCXX__ 20230528 2025-05-07T19:46:58.0116305Z #define __GLIBC_HAVE_LONG_LONG 1 2025-05-07T19:46:58.0116560Z #define __GLIBC_MINOR__ 17 2025-05-07T19:46:58.0116941Z #define __GLIBC_PREREQ(maj,min) ((__GLIBC__ << 16) + __GLIBC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:58.0117355Z #define __GLIBC__ 2 2025-05-07T19:46:58.0117558Z #define __GNUC_GNU_INLINE__ 1 2025-05-07T19:46:58.0117805Z #define __GNUC_MINOR__ 2 2025-05-07T19:46:58.0118033Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:46:58.0118414Z #define __GNUC_PREREQ(maj,min) ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:58.0118827Z #define __GNUC_VA_LIST 2025-05-07T19:46:58.0119041Z #define __GNUC__ 4 2025-05-07T19:46:58.0119246Z #define __GNUG__ 4 2025-05-07T19:46:58.0119448Z #define __GNU_LIBRARY__ 6 2025-05-07T19:46:58.0119703Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:46:58.0119959Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:46:58.0120233Z #define __GXX_RTTI 1 2025-05-07T19:46:58.0120440Z #define __GXX_WEAK__ 1 2025-05-07T19:46:58.0120663Z #define __HAVE_COLUMN 2025-05-07T19:46:58.0120878Z #define __HOST_CONFIG_H__ 2025-05-07T19:46:58.0121118Z #define __HOST_DEFINES_H__ 2025-05-07T19:46:58.0121371Z #define __ID_T_TYPE __U32_TYPE 2025-05-07T19:46:58.0121620Z #define __INO64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:58.0121900Z #define __INO_T_MATCHES_INO64_T 1 2025-05-07T19:46:58.0122165Z #define __INO_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:58.0122576Z #define __INT16_C_SUFFIX__ 2025-05-07T19:46:58.0123068Z #define __INT16_FMTd__ "hd" 2025-05-07T19:46:58.0123320Z #define __INT16_FMTi__ "hi" 2025-05-07T19:46:58.0123562Z #define __INT16_MAX__ 32767 2025-05-07T19:46:58.0123826Z #define __INT16_TYPE__ short 2025-05-07T19:46:58.0124087Z #define __INT32_C_SUFFIX__ 2025-05-07T19:46:58.0124348Z #define __INT32_FMTd__ "d" 2025-05-07T19:46:58.0124602Z #define __INT32_FMTi__ "i" 2025-05-07T19:46:58.0124852Z #define __INT32_MAX__ 2147483647 2025-05-07T19:46:58.0125120Z #define __INT32_TYPE__ int 2025-05-07T19:46:58.0125370Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:46:58.0125636Z #define __INT64_FMTd__ "ld" 2025-05-07T19:46:58.0125881Z #define __INT64_FMTi__ "li" 2025-05-07T19:46:58.0126156Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0126452Z #define __INT64_TYPE__ long int 2025-05-07T19:46:58.0126716Z #define __INT8_C_SUFFIX__ 2025-05-07T19:46:58.0126965Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:46:58.0127218Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:46:58.0127466Z #define __INT8_MAX__ 127 2025-05-07T19:46:58.0127717Z #define __INT8_TYPE__ signed char 2025-05-07T19:46:58.0128000Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:46:58.0128257Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:46:58.0128523Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:46:58.0128793Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0129102Z #define __INTMAX_TYPE__ long int 2025-05-07T19:46:58.0129366Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:46:58.0129625Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:46:58.0129888Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:46:58.0130174Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0130491Z #define __INTPTR_TYPE__ long int 2025-05-07T19:46:58.0131350Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:46:58.0131632Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:46:58.0131722Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:46:58.0131815Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:46:58.0131915Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:46:58.0132033Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:46:58.0132132Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:46:58.0132228Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:46:58.0132340Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:46:58.0132434Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:46:58.0132526Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:46:58.0132622Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:46:58.0132726Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:46:58.0132840Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0132938Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:46:58.0133048Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:46:58.0133145Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:46:58.0133240Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:46:58.0133337Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:46:58.0133453Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:46:58.0133544Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:46:58.0133644Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:46:58.0133758Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:46:58.0133853Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:46:58.0133953Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:46:58.0134067Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:46:58.0134162Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:46:58.0134253Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:46:58.0134353Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:46:58.0134458Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:46:58.0134546Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:46:58.0134638Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:46:58.0134744Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:46:58.0134860Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0134959Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:46:58.0135055Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:46:58.0135272Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:46:58.0135361Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:46:58.0135543Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:46:58.0135659Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:46:58.0135754Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:46:58.0135844Z #define __INT_MAX__ 2147483647 2025-05-07T19:46:58.0135930Z #define __INT_WIDTH__ 32 2025-05-07T19:46:58.0136028Z #define __KERNEL_STRICT_NAMES 2025-05-07T19:46:58.0136119Z #define __KEY_T_TYPE __S32_TYPE 2025-05-07T19:46:58.0136212Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:46:58.0136367Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:46:58.0136452Z #define __LDBL_DIG__ 18 2025-05-07T19:46:58.0136581Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:46:58.0136675Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:46:58.0136781Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:46:58.0137054Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:58.0137145Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:46:58.0137249Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:46:58.0137353Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:46:58.0137473Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:46:58.0137569Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:46:58.0137682Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:46:58.0137802Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:46:58.0137921Z #define __LDBL_REDIR(name,proto) name proto 2025-05-07T19:46:58.0138066Z #define __LDBL_REDIR1(name,proto,alias) name proto 2025-05-07T19:46:58.0138242Z #define __LDBL_REDIR1_NTH(name,proto,alias) name proto __THROW 2025-05-07T19:46:58.0138338Z #define __LDBL_REDIR_DECL(name) 2025-05-07T19:46:58.0138555Z #define __LDBL_REDIR_NTH(name,proto) name proto __THROW 2025-05-07T19:46:58.0138645Z #define __LEAF 2025-05-07T19:46:58.0138735Z #define __LEAF_ATTR 2025-05-07T19:46:58.0138839Z #define __LIBRARY_TYPES_H__ 2025-05-07T19:46:58.0138942Z #define __LITTLE_ENDIAN 1234 2025-05-07T19:46:58.0139035Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:46:58.0139135Z #define __LLONG_WIDTH__ 64 2025-05-07T19:46:58.0139262Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:46:58.0139365Z #define __LONG_LONG_PAIR(HI,LO) LO, HI 2025-05-07T19:46:58.0139465Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0139557Z #define __LONG_WIDTH__ 64 2025-05-07T19:46:58.0139655Z #define __LP64__ 1 2025-05-07T19:46:58.0139991Z #define __MATHCALLX(function,suffix,args,attrib) __MATHDECLX (_Mdouble_,function,suffix, args, attrib) 2025-05-07T19:46:58.0140658Z #define __MATHDECLX(type,function,suffix,args,attrib) __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib); __MATHDECL_1(type, __CONCAT(__,function),suffix, args) __attribute__ (attrib) 2025-05-07T19:46:58.0140768Z #define __MATH_DECLARE_LDOUBLE 1 2025-05-07T19:46:58.0140865Z #define __MATH_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0140960Z #define __MATH_FUNCTIONS_H__ 2025-05-07T19:46:58.0141044Z #define __MMX__ 1 2025-05-07T19:46:58.0141152Z #define __MODE_T_TYPE __U32_TYPE 2025-05-07T19:46:58.0141252Z #define __N(msgid) (msgid) 2025-05-07T19:46:58.0141379Z #define __NFDBITS (8 * (int) sizeof (__fd_mask)) 2025-05-07T19:46:58.0141509Z #define __NLINK_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:58.0141595Z #define __NO_CTYPE 1 2025-05-07T19:46:58.0141689Z #define __NO_INLINE__ 1 2025-05-07T19:46:58.0141784Z #define __NO_MATH_INLINES 1 2025-05-07T19:46:58.0141903Z #define __NTH(fct) __LEAF_ATTR fct throw () 2025-05-07T19:46:58.0142014Z #define __NVCC_DIAG_PRAGMA_SUPPORT__ 1 2025-05-07T19:46:58.0142109Z #define __NVCC__ 1 2025-05-07T19:46:58.0142206Z #define __NV_GLIBCXX_VERSION 40800 2025-05-07T19:46:58.0142305Z #define __NV_LEGACY_LAUNCH 1 2025-05-07T19:46:58.0142420Z #define __NV_NO_HOST_COMPILER_CHECK 1 2025-05-07T19:46:58.0142519Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:46:58.0142620Z #define __OFF64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:58.0142724Z #define __OFF_T_MATCHES_OFF64_T 1 2025-05-07T19:46:58.0142846Z #define __OFF_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0143033Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:46:58.0143140Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:46:58.0143268Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:46:58.0143383Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:46:58.0143496Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:46:58.0143592Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:46:58.0143702Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:46:58.0143799Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:46:58.0143892Z #define __P(args) args 2025-05-07T19:46:58.0143994Z #define __PDP_ENDIAN 3412 2025-05-07T19:46:58.0144075Z #define __PIC__ 2 2025-05-07T19:46:58.0144177Z #define __PID_T_TYPE __S32_TYPE 2025-05-07T19:46:58.0144259Z #define __PIE__ 2 2025-05-07T19:46:58.0144360Z #define __PMT(args) args 2025-05-07T19:46:58.0144455Z #define __POINTER_WIDTH__ 64 2025-05-07T19:46:58.0144552Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:46:58.0144774Z #define __PTHREAD_MUTEX_HAVE_PREV 1 2025-05-07T19:46:58.0145003Z #define __PTHREAD_RWLOCK_INT_FLAGS_SHARED 1 2025-05-07T19:46:58.0145087Z #define __PTHREAD_SPINS 0, 0 2025-05-07T19:46:58.0145174Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:46:58.0145269Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:46:58.0145368Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:46:58.0145458Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:46:58.0145557Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:46:58.0145766Z #define __REDIRECT(name,proto,alias) name proto __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:58.0145968Z #define __REDIRECT_LDBL(name,proto,alias) __REDIRECT (name, proto, alias) 2025-05-07T19:46:58.0146282Z #define __REDIRECT_NTH(name,proto,alias) name proto __THROW __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:58.0146538Z #define __REDIRECT_NTHNL(name,proto,alias) name proto __THROWNL __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:58.0146761Z #define __REDIRECT_NTH_LDBL(name,proto,alias) __REDIRECT_NTH (name, proto, alias) 2025-05-07T19:46:58.0146855Z #define __REGISTER_PREFIX__ 2025-05-07T19:46:58.0146966Z #define __RLIM64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:58.0147073Z #define __RLIM_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:58.0147165Z #define __S16_TYPE short int 2025-05-07T19:46:58.0147256Z #define __S32_TYPE int 2025-05-07T19:46:58.0147337Z #define __S64_TYPE long int 2025-05-07T19:46:58.0147419Z #define __SCHAR_MAX__ 127 2025-05-07T19:46:58.0147491Z #define __SEG_FS 1 2025-05-07T19:46:58.0147585Z #define __SEG_GS 1 2025-05-07T19:46:58.0147669Z #define __SHRT_MAX__ 32767 2025-05-07T19:46:58.0147752Z #define __SHRT_WIDTH__ 16 2025-05-07T19:46:58.0147867Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:46:58.0147960Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:46:58.0148043Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:46:58.0148133Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:46:58.0148238Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:46:58.0148328Z #define __SIZEOF_INT128__ 16 2025-05-07T19:46:58.0148412Z #define __SIZEOF_INT__ 4 2025-05-07T19:46:58.0148524Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:46:58.0148611Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:46:58.0148695Z #define __SIZEOF_LONG__ 8 2025-05-07T19:46:58.0148784Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:46:58.0148891Z #define __SIZEOF_PTHREAD_ATTR_T 56 2025-05-07T19:46:58.0148992Z #define __SIZEOF_PTHREAD_BARRIERATTR_T 4 2025-05-07T19:46:58.0149090Z #define __SIZEOF_PTHREAD_BARRIER_T 32 2025-05-07T19:46:58.0149201Z #define __SIZEOF_PTHREAD_CONDATTR_T 4 2025-05-07T19:46:58.0149290Z #define __SIZEOF_PTHREAD_COND_T 48 2025-05-07T19:46:58.0149386Z #define __SIZEOF_PTHREAD_MUTEXATTR_T 4 2025-05-07T19:46:58.0149483Z #define __SIZEOF_PTHREAD_MUTEX_T 40 2025-05-07T19:46:58.0149598Z #define __SIZEOF_PTHREAD_RWLOCKATTR_T 8 2025-05-07T19:46:58.0149693Z #define __SIZEOF_PTHREAD_RWLOCK_T 56 2025-05-07T19:46:58.0149785Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:46:58.0149887Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:46:58.0149977Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:46:58.0150125Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:46:58.0150212Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:46:58.0150309Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:46:58.0150396Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:46:58.0150478Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:46:58.0150566Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:46:58.0150666Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:46:58.0150772Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:46:58.0150861Z #define __SIZE_WIDTH__ 64 2025-05-07T19:46:58.0150944Z #define __SLONG32_TYPE int 2025-05-07T19:46:58.0151034Z #define __SLONGWORD_TYPE long int 2025-05-07T19:46:58.0151119Z #define __SM_100_RT_HPP__ 2025-05-07T19:46:58.0151210Z #define __SM_100_RT_H__ 2025-05-07T19:46:58.0151308Z #define __SM_20_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0151406Z #define __SM_20_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:58.0151505Z #define __SM_20_INTRINSICS_HPP__ 2025-05-07T19:46:58.0151595Z #define __SM_20_INTRINSICS_H__ 2025-05-07T19:46:58.0151689Z #define __SM_30_INTRINSICS_HPP__ 2025-05-07T19:46:58.0151782Z #define __SM_30_INTRINSICS_H__ 2025-05-07T19:46:58.0151878Z #define __SM_32_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0151967Z #define __SM_32_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:58.0152059Z #define __SM_32_INTRINSICS_HPP__ 2025-05-07T19:46:58.0152157Z #define __SM_32_INTRINSICS_H__ 2025-05-07T19:46:58.0152247Z #define __SM_35_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:58.0152334Z #define __SM_35_INTRINSICS_H__ 2025-05-07T19:46:58.0152431Z #define __SM_60_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0152531Z #define __SM_60_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:58.0152621Z #define __SM_61_INTRINSICS_HPP__ 2025-05-07T19:46:58.0152760Z #define __SM_61_INTRINSICS_H__ 2025-05-07T19:46:58.0152857Z #define __SM_70_RT_HPP__ 2025-05-07T19:46:58.0152937Z #define __SM_70_RT_H__ 2025-05-07T19:46:58.0153021Z #define __SM_80_RT_HPP__ 2025-05-07T19:46:58.0153100Z #define __SM_80_RT_H__ 2025-05-07T19:46:58.0153196Z #define __SM_90_RT_HPP__ 2025-05-07T19:46:58.0153274Z #define __SM_90_RT_H__ 2025-05-07T19:46:58.0153365Z #define __SQUAD_TYPE long int 2025-05-07T19:46:58.0153455Z #define __SSE2_MATH__ 1 2025-05-07T19:46:58.0153529Z #define __SSE2__ 1 2025-05-07T19:46:58.0153609Z #define __SSE_MATH__ 1 2025-05-07T19:46:58.0153687Z #define __SSE__ 1 2025-05-07T19:46:58.0153796Z #define __SSIZE_T_TYPE __SWORD_TYPE 2025-05-07T19:46:58.0153911Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16UL 2025-05-07T19:46:58.0154021Z #define __STDCPP_MATH_SPEC_FUNCS__ 201003L 2025-05-07T19:46:58.0154133Z #define __STDCPP_THREADS__ 1 2025-05-07T19:46:58.0154215Z #define __STDC_HOSTED__ 1 2025-05-07T19:46:58.0154305Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:46:58.0154399Z #define __STDC_IEC_559__ 1 2025-05-07T19:46:58.0154495Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:46:58.0154580Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:46:58.0154663Z #define __STDC_UTF_16__ 1 2025-05-07T19:46:58.0154770Z #define __STDC_UTF_32__ 1 2025-05-07T19:46:58.0154850Z #define __STDC__ 1 2025-05-07T19:46:58.0154927Z #define __STDDEF_H 2025-05-07T19:46:58.0155009Z #define __STRING(x) #x 2025-05-07T19:46:58.0155125Z #define __SURFACE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:58.0155209Z #define __SURFACE_TYPES_H__ 2025-05-07T19:46:58.0155333Z #define __SUSECONDS_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0155438Z #define __SWORD_TYPE long int 2025-05-07T19:46:58.0155550Z #define __SYSCALL_SLONG_TYPE __SLONGWORD_TYPE 2025-05-07T19:46:58.0155661Z #define __SYSCALL_ULONG_TYPE __ULONGWORD_TYPE 2025-05-07T19:46:58.0155753Z #define __SYSCALL_WORDSIZE 64 2025-05-07T19:46:58.0155872Z #define __TEXTURE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:58.0155962Z #define __TEXTURE_TYPES_H__ 2025-05-07T19:46:58.0156051Z #define __THROW throw () 2025-05-07T19:46:58.0156149Z #define __THROWNL throw () 2025-05-07T19:46:58.0156237Z #define __TIMER_T_TYPE void * 2025-05-07T19:46:58.0156338Z #define __TIME_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:58.0156431Z #define __U16_TYPE unsigned short int 2025-05-07T19:46:58.0156594Z #define __U32_TYPE unsigned int 2025-05-07T19:46:58.0156687Z #define __U64_TYPE unsigned long int 2025-05-07T19:46:58.0156768Z #define __UID_T_TYPE __U32_TYPE 2025-05-07T19:46:58.0156871Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:46:58.0156954Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:46:58.0157035Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:46:58.0157116Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:46:58.0157217Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:46:58.0157296Z #define __UINT16_MAX__ 65535 2025-05-07T19:46:58.0157391Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:46:58.0157490Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:46:58.0157581Z #define __UINT32_FMTX__ "X" 2025-05-07T19:46:58.0157664Z #define __UINT32_FMTo__ "o" 2025-05-07T19:46:58.0157750Z #define __UINT32_FMTu__ "u" 2025-05-07T19:46:58.0157851Z #define __UINT32_FMTx__ "x" 2025-05-07T19:46:58.0157941Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:46:58.0158031Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:46:58.0158139Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:46:58.0158230Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:46:58.0158310Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:46:58.0158393Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:46:58.0158492Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:46:58.0158599Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:46:58.0158697Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:46:58.0158797Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:46:58.0158887Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:46:58.0158972Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:46:58.0159056Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:46:58.0159215Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:46:58.0159298Z #define __UINT8_MAX__ 255 2025-05-07T19:46:58.0159384Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:46:58.0159479Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:46:58.0159563Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:46:58.0159651Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:46:58.0159737Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:46:58.0159830Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:46:58.0159936Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:46:58.0160037Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:46:58.0160130Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:46:58.0160218Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:46:58.0160302Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:46:58.0160385Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:46:58.0160477Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:46:58.0160581Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:46:58.0160685Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:46:58.0160778Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:46:58.0160867Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:46:58.0160955Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:46:58.0161051Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:46:58.0161134Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:46:58.0161220Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:46:58.0161323Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:46:58.0161414Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:46:58.0161497Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:46:58.0161580Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:46:58.0161671Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:46:58.0161760Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:46:58.0161855Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:46:58.0161939Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:46:58.0162029Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:46:58.0162113Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:46:58.0162195Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:46:58.0162311Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:46:58.0162511Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:46:58.0162611Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:46:58.0162766Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:46:58.0163040Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:46:58.0163135Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:46:58.0163229Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:46:58.0163346Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:46:58.0163441Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:46:58.0163577Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:46:58.0163669Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:46:58.0163767Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:46:58.0163856Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:46:58.0163970Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:46:58.0164079Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:46:58.0164170Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:46:58.0164260Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:46:58.0164355Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:46:58.0164462Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:46:58.0164571Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:46:58.0164664Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:46:58.0164764Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:46:58.0164852Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:46:58.0164944Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:46:58.0165076Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:46:58.0165200Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:46:58.0165291Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:46:58.0165385Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:46:58.0165483Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:46:58.0165636Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:46:58.0165735Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:46:58.0165853Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:46:58.0165953Z #define __ULONG32_TYPE unsigned int 2025-05-07T19:46:58.0166064Z #define __ULONGWORD_TYPE unsigned long int 2025-05-07T19:46:58.0166171Z #define __UQUAD_TYPE unsigned long int 2025-05-07T19:46:58.0166280Z #define __USECONDS_T_TYPE __U32_TYPE 2025-05-07T19:46:58.0166378Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:46:58.0166465Z #define __USE_ANSI 1 2025-05-07T19:46:58.0166570Z #define __USE_ATFILE 1 2025-05-07T19:46:58.0166654Z #define __USE_BSD 1 2025-05-07T19:46:58.0166750Z #define __USE_FORTIFY_LEVEL 0 2025-05-07T19:46:58.0166832Z #define __USE_GNU 1 2025-05-07T19:46:58.0166923Z #define __USE_ISOC11 1 2025-05-07T19:46:58.0167008Z #define __USE_ISOC95 1 2025-05-07T19:46:58.0167284Z #define __USE_ISOC99 1 2025-05-07T19:46:58.0167390Z #define __USE_ISOCXX11 1 2025-05-07T19:46:58.0167488Z #define __USE_LARGEFILE 1 2025-05-07T19:46:58.0167584Z #define __USE_LARGEFILE64 1 2025-05-07T19:46:58.0167677Z #define __USE_MISC 1 2025-05-07T19:46:58.0167776Z #define __USE_POSIX 1 2025-05-07T19:46:58.0167871Z #define __USE_POSIX199309 1 2025-05-07T19:46:58.0167960Z #define __USE_POSIX199506 1 2025-05-07T19:46:58.0168063Z #define __USE_POSIX2 1 2025-05-07T19:46:58.0168147Z #define __USE_SVID 1 2025-05-07T19:46:58.0168228Z #define __USE_UNIX98 1 2025-05-07T19:46:58.0168314Z #define __USE_XOPEN 1 2025-05-07T19:46:58.0168418Z #define __USE_XOPEN2K 1 2025-05-07T19:46:58.0168506Z #define __USE_XOPEN2K8 1 2025-05-07T19:46:58.0168595Z #define __USE_XOPEN2K8XSI 1 2025-05-07T19:46:58.0168700Z #define __USE_XOPEN2KXSI 1 2025-05-07T19:46:58.0168794Z #define __USE_XOPEN_EXTENDED 1 2025-05-07T19:46:58.0168895Z #define __USING_NAMESPACE_C99(name) 2025-05-07T19:46:58.0168995Z #define __USING_NAMESPACE_STD(name) 2025-05-07T19:46:58.0169105Z #define __UWORD_TYPE unsigned long int 2025-05-07T19:46:58.0169200Z #define __VECTOR_FUNCTIONS_HPP__ 2025-05-07T19:46:58.0169298Z #define __VECTOR_FUNCTIONS_H__ 2025-05-07T19:46:58.0169411Z #define __VECTOR_TYPES_H__ 2025-05-07T19:46:58.0169861Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:46:58.0169986Z #define __WAIT_INT(status) (*(int *) &(status)) 2025-05-07T19:46:58.0170194Z #define __WAIT_STATUS void * 2025-05-07T19:46:58.0170312Z #define __WAIT_STATUS_DEFN void * 2025-05-07T19:46:58.0170400Z #define __WALL 0x40000000 2025-05-07T19:46:58.0170494Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:46:58.0170607Z #define __WCHAR_TYPE__ int 2025-05-07T19:46:58.0170699Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:46:58.0170799Z #define __WCLONE 0x80000000 2025-05-07T19:46:58.0170939Z #define __WCOREDUMP(status) ((status) & __WCOREFLAG) 2025-05-07T19:46:58.0171047Z #define __WCOREFLAG 0x80 2025-05-07T19:46:58.0171200Z #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) 2025-05-07T19:46:58.0171362Z #define __WIFCONTINUED(status) ((status) == __W_CONTINUED) 2025-05-07T19:46:58.0171527Z #define __WIFEXITED(status) (__WTERMSIG(status) == 0) 2025-05-07T19:46:58.0171757Z #define __WIFSIGNALED(status) (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) 2025-05-07T19:46:58.0171907Z #define __WIFSTOPPED(status) (((status) & 0xff) == 0x7f) 2025-05-07T19:46:58.0172013Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:46:58.0172112Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:46:58.0172202Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:46:58.0172292Z #define __WINT_WIDTH__ 32 2025-05-07T19:46:58.0172406Z #define __WNOTHREAD 0x20000000 2025-05-07T19:46:58.0172492Z #define __WORDSIZE 64 2025-05-07T19:46:58.0172588Z #define __WORDSIZE_TIME64_COMPAT32 1 2025-05-07T19:46:58.0172724Z #define __WSTOPSIG(status) __WEXITSTATUS(status) 2025-05-07T19:46:58.0172839Z #define __WTERMSIG(status) ((status) & 0x7f) 2025-05-07T19:46:58.0172934Z #define __W_CONTINUED 0xffff 2025-05-07T19:46:58.0173055Z #define __W_EXITCODE(ret,sig) ((ret) << 8 | (sig)) 2025-05-07T19:46:58.0173259Z #define __W_STOPCODE(sig) ((sig) << 8 | 0x7f) 2025-05-07T19:46:58.0173353Z #define ____FILE_defined 1 2025-05-07T19:46:58.0173447Z #define ____mbstate_t_defined 1 2025-05-07T19:46:58.0173583Z #define __align__(n) __attribute__((aligned(n))) 2025-05-07T19:46:58.0173774Z #define __always_inline __inline __attribute__ ((__always_inline__)) 2025-05-07T19:46:58.0173861Z #define __amd64 1 2025-05-07T19:46:58.0173945Z #define __amd64__ 1 2025-05-07T19:46:58.0174065Z #define __annotate__(a) __attribute__((a)) 2025-05-07T19:46:58.0174167Z #define __attribute_artificial__ 2025-05-07T19:46:58.0174314Z #define __attribute_const__ __attribute__ ((__const__)) 2025-05-07T19:46:58.0174518Z #define __attribute_deprecated__ __attribute__ ((__deprecated__)) 2025-05-07T19:46:58.0174728Z #define __attribute_format_arg__(x) __attribute__ ((__format_arg__ (x))) 2025-05-07T19:46:58.0174994Z #define __attribute_format_strfmon__(a,b) __attribute__ ((__format__ (__strfmon__, a, b))) 2025-05-07T19:46:58.0175165Z #define __attribute_malloc__ __attribute__ ((__malloc__)) 2025-05-07T19:46:58.0175332Z #define __attribute_noinline__ __attribute__ ((__noinline__)) 2025-05-07T19:46:58.0175469Z #define __attribute_pure__ __attribute__ ((__pure__)) 2025-05-07T19:46:58.0175604Z #define __attribute_used__ __attribute__ ((__used__)) 2025-05-07T19:46:58.0175854Z #define __attribute_warn_unused_result__ __attribute__ ((__warn_unused_result__)) 2025-05-07T19:46:58.0175953Z #define __blkcnt_t_defined 2025-05-07T19:46:58.0176050Z #define __blksize_t_defined 2025-05-07T19:46:58.0176263Z #define __bos(ptr) __builtin_object_size (ptr, __USE_FORTIFY_LEVEL > 1) 2025-05-07T19:46:58.0176394Z #define __bos0(ptr) __builtin_object_size (ptr, 0) 2025-05-07T19:46:58.0176478Z #define __bounded 2025-05-07T19:46:58.0177276Z #define __bswap_16(x) (__extension__ ({ unsigned short int __v, __x = (unsigned short int) (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_16 (__x); else __asm__ ("rorw $8, %w0" : "=r" (__v) : "0" (__x) : "cc"); __v; })) 2025-05-07T19:46:58.0177782Z #define __bswap_32(x) (__extension__ ({ unsigned int __v, __x = (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_32 (__x); else __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); __v; })) 2025-05-07T19:46:58.0178266Z #define __bswap_64(x) (__extension__ ({ __uint64_t __v, __x = (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_64 (__x); else __asm__ ("bswap %q0" : "=r" (__v) : "0" (__x)); __v; })) 2025-05-07T19:46:58.0178602Z #define __bswap_constant_16(x) ((unsigned short int) ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))) 2025-05-07T19:46:58.0179051Z #define __bswap_constant_32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) 2025-05-07T19:46:58.0180102Z #define __bswap_constant_64(x) (__extension__ ((((x) & 0xff00000000000000ull) >> 56) | (((x) & 0x00ff000000000000ull) >> 40) | (((x) & 0x0000ff0000000000ull) >> 24) | (((x) & 0x000000ff00000000ull) >> 8) | (((x) & 0x00000000ff000000ull) << 8) | (((x) & 0x0000000000ff0000ull) << 24) | (((x) & 0x000000000000ff00ull) << 40) | (((x) & 0x00000000000000ffull) << 56))) 2025-05-07T19:46:58.0180220Z #define __builtin_align__(a) __align__(a) 2025-05-07T19:46:58.0180309Z #define __catch(X) catch(X) 2025-05-07T19:46:58.0180384Z #define __cdecl 2025-05-07T19:46:58.0180475Z #define __clang__ 1 2025-05-07T19:46:58.0180584Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:46:58.0180669Z #define __clang_major__ 16 2025-05-07T19:46:58.0180749Z #define __clang_minor__ 0 2025-05-07T19:46:58.0180857Z #define __clang_patchlevel__ 6 2025-05-07T19:46:58.0181258Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:46:58.0181377Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:46:58.0181483Z #define __clock_t_defined 1 2025-05-07T19:46:58.0181575Z #define __clockid_t_defined 1 2025-05-07T19:46:58.0181819Z #define __cluster_dims__(...) __attribute__((cluster_dims(__VA_ARGS__))) 2025-05-07T19:46:58.0181926Z #define __code_model_small__ 1 2025-05-07T19:46:58.0182028Z #define __constant__ __location__(constant) 2025-05-07T19:46:58.0182123Z #define __cplusplus 201703L 2025-05-07T19:46:58.0182222Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:46:58.0182333Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:46:58.0182437Z #define __cpp_alias_templates 200704L 2025-05-07T19:46:58.0182533Z #define __cpp_aligned_new 201606L 2025-05-07T19:46:58.0182634Z #define __cpp_attributes 200809L 2025-05-07T19:46:58.0182731Z #define __cpp_binary_literals 201304L 2025-05-07T19:46:58.0182828Z #define __cpp_capture_star_this 201603L 2025-05-07T19:46:58.0182922Z #define __cpp_constexpr 201603L 2025-05-07T19:46:58.0183052Z #define __cpp_constexpr_in_decltype 201711L 2025-05-07T19:46:58.0183149Z #define __cpp_decltype 200707L 2025-05-07T19:46:58.0183259Z #define __cpp_decltype_auto 201304L 2025-05-07T19:46:58.0183356Z #define __cpp_deduction_guides 201703L 2025-05-07T19:46:58.0183488Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:46:58.0183585Z #define __cpp_digit_separators 201309L 2025-05-07T19:46:58.0183695Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:46:58.0183795Z #define __cpp_exceptions 199711L 2025-05-07T19:46:58.0183893Z #define __cpp_fold_expressions 201603L 2025-05-07T19:46:58.0183995Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:46:58.0184109Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:46:58.0184215Z #define __cpp_hex_float 201603L 2025-05-07T19:46:58.0184314Z #define __cpp_if_constexpr 201606L 2025-05-07T19:46:58.0184424Z #define __cpp_impl_destroying_delete 201806L 2025-05-07T19:46:58.0184544Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:46:58.0184636Z #define __cpp_init_captures 201304L 2025-05-07T19:46:58.0184734Z #define __cpp_initializer_lists 200806L 2025-05-07T19:46:58.0184831Z #define __cpp_inline_variables 201606L 2025-05-07T19:46:58.0184929Z #define __cpp_lambdas 200907L 2025-05-07T19:46:58.0185050Z #define __cpp_lib_addressof_constexpr 201603 2025-05-07T19:46:58.0185156Z #define __cpp_lib_array_constexpr 201803L 2025-05-07T19:46:58.0185257Z #define __cpp_lib_as_const 201510 2025-05-07T19:46:58.0185353Z #define __cpp_lib_bool_constant 201505 2025-05-07T19:46:58.0185457Z #define __cpp_lib_exchange_function 201304 2025-05-07T19:46:58.0185665Z #define __cpp_lib_has_unique_object_representations 201606 2025-05-07T19:46:58.0185771Z #define __cpp_lib_hypot 201603 2025-05-07T19:46:58.0185870Z #define __cpp_lib_integer_sequence 201304 2025-05-07T19:46:58.0185993Z #define __cpp_lib_integral_constant_callable 201304 2025-05-07T19:46:58.0186100Z #define __cpp_lib_is_aggregate 201703 2025-05-07T19:46:58.0186193Z #define __cpp_lib_is_final 201402L 2025-05-07T19:46:58.0186285Z #define __cpp_lib_is_invocable 201703 2025-05-07T19:46:58.0186397Z #define __cpp_lib_is_null_pointer 201309 2025-05-07T19:46:58.0186495Z #define __cpp_lib_is_swappable 201603 2025-05-07T19:46:58.0186594Z #define __cpp_lib_launder 201606 2025-05-07T19:46:58.0186700Z #define __cpp_lib_logical_traits 201510 2025-05-07T19:46:58.0186830Z #define __cpp_lib_make_reverse_iterator 201402 2025-05-07T19:46:58.0186956Z #define __cpp_lib_math_special_functions 201603L 2025-05-07T19:46:58.0187057Z #define __cpp_lib_result_of_sfinae 201210 2025-05-07T19:46:58.0187213Z #define __cpp_lib_robust_nonmodifying_seq_ops 201304 2025-05-07T19:46:58.0187354Z #define __cpp_lib_transformation_trait_aliases 201304 2025-05-07T19:46:58.0187460Z #define __cpp_lib_tuple_element_t 201402L 2025-05-07T19:46:58.0187553Z #define __cpp_lib_tuples_by_type 201304 2025-05-07T19:46:58.0187708Z #define __cpp_lib_type_trait_variable_templates 201510L 2025-05-07T19:46:58.0187797Z #define __cpp_lib_void_t 201411 2025-05-07T19:46:58.0187911Z #define __cpp_named_character_escapes 202207L 2025-05-07T19:46:58.0188037Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:46:58.0188160Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:46:58.0188338Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:46:58.0188445Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:46:58.0188591Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:46:58.0188677Z #define __cpp_nsdmi 200809L 2025-05-07T19:46:58.0188774Z #define __cpp_range_based_for 201603L 2025-05-07T19:46:58.0188887Z #define __cpp_raw_strings 200710L 2025-05-07T19:46:58.0188978Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:46:58.0189086Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:46:58.0189186Z #define __cpp_rtti 199711L 2025-05-07T19:46:58.0189282Z #define __cpp_rvalue_references 200610L 2025-05-07T19:46:58.0189373Z #define __cpp_static_assert 201411L 2025-05-07T19:46:58.0189479Z #define __cpp_static_call_operator 202207L 2025-05-07T19:46:58.0189594Z #define __cpp_structured_bindings 201606L 2025-05-07T19:46:58.0189699Z #define __cpp_template_auto 201606L 2025-05-07T19:46:58.0189878Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:46:58.0190001Z #define __cpp_unicode_characters 200704L 2025-05-07T19:46:58.0190285Z #define __cpp_unicode_literals 200710L 2025-05-07T19:46:58.0190398Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:46:58.0190500Z #define __cpp_variable_templates 201304L 2025-05-07T19:46:58.0190623Z #define __cpp_variadic_templates 200704L 2025-05-07T19:46:58.0190723Z #define __cpp_variadic_using 201611L 2025-05-07T19:46:58.0190838Z #define __cudaCDP2DeviceGetAttribute 2025-05-07T19:46:58.0190955Z #define __cudaCDP2DeviceGetCacheConfig 2025-05-07T19:46:58.0191059Z #define __cudaCDP2DeviceGetLimit 2025-05-07T19:46:58.0191183Z #define __cudaCDP2DeviceGetSharedMemConfig 2025-05-07T19:46:58.0191292Z #define __cudaCDP2EventCreateWithFlags 2025-05-07T19:46:58.0191409Z #define __cudaCDP2EventDestroy 2025-05-07T19:46:58.0191509Z #define __cudaCDP2EventRecord 2025-05-07T19:46:58.0191621Z #define __cudaCDP2EventRecordWithFlags 2025-05-07T19:46:58.0191759Z #define __cudaCDP2EventRecordWithFlags_ptsz 2025-05-07T19:46:58.0191858Z #define __cudaCDP2EventRecord_ptsz 2025-05-07T19:46:58.0191952Z #define __cudaCDP2Free 2025-05-07T19:46:58.0192056Z #define __cudaCDP2FuncGetAttributes 2025-05-07T19:46:58.0192172Z #define __cudaCDP2GetDevice 2025-05-07T19:46:58.0192268Z #define __cudaCDP2GetDeviceCount 2025-05-07T19:46:58.0192361Z #define __cudaCDP2GetErrorName 2025-05-07T19:46:58.0192516Z #define __cudaCDP2GetErrorString 2025-05-07T19:46:58.0192609Z #define __cudaCDP2GetLastError 2025-05-07T19:46:58.0192719Z #define __cudaCDP2GetParameterBuffer 2025-05-07T19:46:58.0192826Z #define __cudaCDP2GetParameterBufferV2 2025-05-07T19:46:58.0192925Z #define __cudaCDP2LaunchDevice 2025-05-07T19:46:58.0193023Z #define __cudaCDP2LaunchDeviceV2 2025-05-07T19:46:58.0193129Z #define __cudaCDP2LaunchDeviceV2_ptsz 2025-05-07T19:46:58.0193236Z #define __cudaCDP2LaunchDevice_ptsz 2025-05-07T19:46:58.0193325Z #define __cudaCDP2Malloc 2025-05-07T19:46:58.0193422Z #define __cudaCDP2Memcpy2DAsync 2025-05-07T19:46:58.0193536Z #define __cudaCDP2Memcpy2DAsync_ptsz 2025-05-07T19:46:58.0193637Z #define __cudaCDP2Memcpy3DAsync 2025-05-07T19:46:58.0193738Z #define __cudaCDP2Memcpy3DAsync_ptsz 2025-05-07T19:46:58.0193832Z #define __cudaCDP2MemcpyAsync 2025-05-07T19:46:58.0193939Z #define __cudaCDP2MemcpyAsync_ptsz 2025-05-07T19:46:58.0194033Z #define __cudaCDP2Memset2DAsync 2025-05-07T19:46:58.0194138Z #define __cudaCDP2Memset2DAsync_ptsz 2025-05-07T19:46:58.0194241Z #define __cudaCDP2Memset3DAsync 2025-05-07T19:46:58.0194342Z #define __cudaCDP2Memset3DAsync_ptsz 2025-05-07T19:46:58.0194436Z #define __cudaCDP2MemsetAsync 2025-05-07T19:46:58.0194535Z #define __cudaCDP2MemsetAsync_ptsz 2025-05-07T19:46:58.0194743Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessor 2025-05-07T19:46:58.0194985Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessorWithFlags 2025-05-07T19:46:58.0195097Z #define __cudaCDP2PeekAtLastError 2025-05-07T19:46:58.0195216Z #define __cudaCDP2RuntimeGetVersion 2025-05-07T19:46:58.0195328Z #define __cudaCDP2StreamCreateWithFlags 2025-05-07T19:46:58.0195528Z #define __cudaCDP2StreamDestroy 2025-05-07T19:46:58.0195631Z #define __cudaCDP2StreamWaitEvent 2025-05-07T19:46:58.0195763Z #define __cudaCDP2StreamWaitEvent_ptsz 2025-05-07T19:46:58.0195863Z #define __cudaGet_blockDim() blockDim 2025-05-07T19:46:58.0195967Z #define __cudaGet_blockIdx() blockIdx 2025-05-07T19:46:58.0196084Z #define __cudaGet_gridDim() gridDim 2025-05-07T19:46:58.0196189Z #define __cudaGet_threadIdx() threadIdx 2025-05-07T19:46:58.0196288Z #define __cudaGet_warpSize() warpSize 2025-05-07T19:46:58.0196437Z #define __cudart_builtin__ __location__(cudart_builtin) 2025-05-07T19:46:58.0196547Z #define __daddr_t_defined 2025-05-07T19:46:58.0196637Z #define __dev_t_defined 2025-05-07T19:46:58.0196737Z #define __device__ __location__(device) 2025-05-07T19:46:58.0196903Z #define __device_builtin__ __location__(device_builtin) 2025-05-07T19:46:58.0197146Z #define __device_builtin_surface_type__ __location__(device_builtin_surface_type) 2025-05-07T19:46:58.0197382Z #define __device_builtin_texture_type__ __location__(device_builtin_texture_type) 2025-05-07T19:46:58.0197547Z #define __errordecl(name,msg) extern void name (void) 2025-05-07T19:46:58.0197690Z #define __exctype(name) extern int name (int) __THROW 2025-05-07T19:46:58.0197873Z #define __exctype_l(name) extern int name (int, __locale_t) __THROW 2025-05-07T19:46:58.0197962Z #define __export__ 2025-05-07T19:46:58.0198233Z #define __extern_always_inline extern __always_inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:58.0198439Z #define __extern_inline extern __inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:58.0198518Z #define __flexarr [] 2025-05-07T19:46:58.0198711Z #define __forceinline__ __inline__ __attribute__((always_inline)) 2025-05-07T19:46:58.0198929Z #define __fortify_function __extern_always_inline __attribute_artificial__ 2025-05-07T19:46:58.0199025Z #define __fsblkcnt_t_defined 2025-05-07T19:46:58.0199136Z #define __fsfilcnt_t_defined 2025-05-07T19:46:58.0199224Z #define __gid_t_defined 2025-05-07T19:46:58.0199374Z #define __glibc_likely(cond) __builtin_expect((cond), 1) 2025-05-07T19:46:58.0199531Z #define __glibc_unlikely(cond) __builtin_expect((cond), 0) 2025-05-07T19:46:58.0199787Z #define __glibcxx_assert(cond) do { __glibcxx_constexpr_assert(cond); } while (false) 2025-05-07T19:46:58.0199894Z #define __glibcxx_class_requires(_a,_b) 2025-05-07T19:46:58.0200061Z #define __glibcxx_class_requires2(_a,_b,_c) 2025-05-07T19:46:58.0200198Z #define __glibcxx_class_requires3(_a,_b,_c,_d) 2025-05-07T19:46:58.0200328Z #define __glibcxx_class_requires4(_a,_b,_c,_d,_e) 2025-05-07T19:46:58.0200706Z #define __glibcxx_constexpr_assert(cond) if (__builtin_is_constant_evaluated() && !bool(cond)) __builtin_unreachable() 2025-05-07T19:46:58.0200925Z #define __glibcxx_digits10_b(T,B) (__glibcxx_digits_b (T,B) * 643L / 2136) 2025-05-07T19:46:58.0201096Z #define __glibcxx_digits_b(T,B) (B - __glibcxx_signed_b (T,B)) 2025-05-07T19:46:58.0201204Z #define __glibcxx_function_requires(...) 2025-05-07T19:46:58.0201309Z #define __glibcxx_integral_traps true 2025-05-07T19:46:58.0201634Z #define __glibcxx_max_b(T,B) (__glibcxx_signed_b (T,B) ? (((((T)1 << (__glibcxx_digits_b (T,B) - 1)) - 1) << 1) + 1) : ~(T)0) 2025-05-07T19:46:58.0201883Z #define __glibcxx_min_b(T,B) (__glibcxx_signed_b (T,B) ? -__glibcxx_max_b (T,B) - 1 : (T)0) 2025-05-07T19:46:58.0202082Z #define __glibcxx_requires_can_decrement_range(_First1,_Last1,_First2) 2025-05-07T19:46:58.0202238Z #define __glibcxx_requires_can_increment(_First,_Size) 2025-05-07T19:46:58.0202523Z #define __glibcxx_requires_can_increment_range(_First1,_Last1,_First2) 2025-05-07T19:46:58.0202634Z #define __glibcxx_requires_cond(_Cond,_Msg) 2025-05-07T19:46:58.0202768Z #define __glibcxx_requires_heap(_First,_Last) 2025-05-07T19:46:58.0203097Z #define __glibcxx_requires_heap_pred(_First,_Last,_Pred) 2025-05-07T19:46:58.0203241Z #define __glibcxx_requires_irreflexive(_First,_Last) 2025-05-07T19:46:58.0203382Z #define __glibcxx_requires_irreflexive2(_First,_Last) 2025-05-07T19:46:58.0203640Z #define __glibcxx_requires_irreflexive_pred(_First,_Last,_Pred) 2025-05-07T19:46:58.0203825Z #define __glibcxx_requires_irreflexive_pred2(_First,_Last,_Pred) 2025-05-07T19:46:58.0203985Z #define __glibcxx_requires_non_empty_range(_First,_Last) 2025-05-07T19:46:58.0204107Z #define __glibcxx_requires_nonempty() 2025-05-07T19:46:58.0204305Z #define __glibcxx_requires_partitioned_lower(_First,_Last,_Value) 2025-05-07T19:46:58.0204542Z #define __glibcxx_requires_partitioned_lower_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:58.0204749Z #define __glibcxx_requires_partitioned_upper(_First,_Last,_Value) 2025-05-07T19:46:58.0204981Z #define __glibcxx_requires_partitioned_upper_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:58.0205108Z #define __glibcxx_requires_sorted(_First,_Last) 2025-05-07T19:46:58.0205282Z #define __glibcxx_requires_sorted_pred(_First,_Last,_Pred) 2025-05-07T19:46:58.0205472Z #define __glibcxx_requires_sorted_set(_First1,_Last1,_First2) 2025-05-07T19:46:58.0205692Z #define __glibcxx_requires_sorted_set_pred(_First1,_Last1,_First2,_Pred) 2025-05-07T19:46:58.0205806Z #define __glibcxx_requires_string(_String) 2025-05-07T19:46:58.0205970Z #define __glibcxx_requires_string_len(_String,_Len) 2025-05-07T19:46:58.0206083Z #define __glibcxx_requires_subscript(_N) 2025-05-07T19:46:58.0206222Z #define __glibcxx_requires_valid_range(_First,_Last) 2025-05-07T19:46:58.0206355Z #define __glibcxx_signed_b(T,B) ((T)(-1) < 0) 2025-05-07T19:46:58.0206456Z #define __global__ __location__(global) 2025-05-07T19:46:58.0206544Z #define __gnu_linux__ 1 2025-05-07T19:46:58.0206680Z #define __grid_constant__ __location__(grid_constant) 2025-05-07T19:46:58.0206793Z #define __have_pthread_attr_t 1 2025-05-07T19:46:58.0206892Z #define __host__ __location__(host) 2025-05-07T19:46:58.0206984Z #define __id_t_defined 2025-05-07T19:46:58.0207094Z #define __import__ 2025-05-07T19:46:58.0207241Z #define __inline_hint__ __attribute__((nv_inline_hint)) 2025-05-07T19:46:58.0207327Z #define __ino64_t_defined 2025-05-07T19:46:58.0207418Z #define __ino_t_defined 2025-05-07T19:46:58.0207518Z #define __int8_t_defined 2025-05-07T19:46:58.0207747Z #define __intN_t(N,MODE) typedef int int##N##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:58.0207895Z #define __isalnum_l(c,l) __isctype_l((c), _ISalnum, (l)) 2025-05-07T19:46:58.0208121Z #define __isalpha_l(c,l) __isctype_l((c), _ISalpha, (l)) 2025-05-07T19:46:58.0208223Z #define __isascii(c) (((c) & ~0x7f) == 0) 2025-05-07T19:46:58.0208333Z #define __isascii_l(c,l) ((l), __isascii (c)) 2025-05-07T19:46:58.0208502Z #define __isblank_l(c,l) __isctype_l((c), _ISblank, (l)) 2025-05-07T19:46:58.0208645Z #define __iscntrl_l(c,l) __isctype_l((c), _IScntrl, (l)) 2025-05-07T19:46:58.0208917Z #define __isctype_l(c,type,locale) ((locale)->__ctype_b[(int) (c)] & (unsigned short int) type) 2025-05-07T19:46:58.0209060Z #define __isdigit_l(c,l) __isctype_l((c), _ISdigit, (l)) 2025-05-07T19:46:58.0209214Z #define __isgraph_l(c,l) __isctype_l((c), _ISgraph, (l)) 2025-05-07T19:46:58.0209416Z #define __isleap(year) ((year) % 4 == 0 && ((year) % 100 != 0 || (year) % 400 == 0)) 2025-05-07T19:46:58.0209564Z #define __islower_l(c,l) __isctype_l((c), _ISlower, (l)) 2025-05-07T19:46:58.0209717Z #define __isprint_l(c,l) __isctype_l((c), _ISprint, (l)) 2025-05-07T19:46:58.0209867Z #define __ispunct_l(c,l) __isctype_l((c), _ISpunct, (l)) 2025-05-07T19:46:58.0210020Z #define __isspace_l(c,l) __isctype_l((c), _ISspace, (l)) 2025-05-07T19:46:58.0210179Z #define __isupper_l(c,l) __isctype_l((c), _ISupper, (l)) 2025-05-07T19:46:58.0210341Z #define __isxdigit_l(c,l) __isctype_l((c), _ISxdigit, (l)) 2025-05-07T19:46:58.0210429Z #define __k8 1 2025-05-07T19:46:58.0210513Z #define __k8__ 1 2025-05-07T19:46:58.0210623Z #define __key_t_defined 2025-05-07T19:46:58.0210822Z #define __launch_bounds__(...) __annotate__(launch_bounds(__VA_ARGS__)) 2025-05-07T19:46:58.0210920Z #define __ldiv_t_defined 1 2025-05-07T19:46:58.0211012Z #define __linux 1 2025-05-07T19:46:58.0211094Z #define __linux__ 1 2025-05-07T19:46:58.0211242Z #define __lldiv_t_defined 1 2025-05-07T19:46:58.0211326Z #define __llvm__ 1 2025-05-07T19:46:58.0211434Z #define __location__(a) __annotate__(a) 2025-05-07T19:46:58.0211538Z #define __long_double_t long double 2025-05-07T19:46:58.0211634Z #define __malloc_and_calloc_defined 2025-05-07T19:46:58.0211754Z #define __managed__ __location__(managed) 2025-05-07T19:46:58.0211884Z #define __maxnreg__(a) __attribute__((maxnreg(a))) 2025-05-07T19:46:58.0211969Z #define __mode_t_defined 2025-05-07T19:46:58.0212054Z #define __need_IOV_MAX 2025-05-07T19:46:58.0212155Z #define __need_clockid_t 2025-05-07T19:46:58.0212244Z #define __nlink_t_defined 2025-05-07T19:46:58.0212365Z #define __no_return__ __attribute__((noreturn)) 2025-05-07T19:46:58.0212497Z #define __noinline__ __attribute__((noinline)) 2025-05-07T19:46:58.0212667Z #define __nonnull(params) __attribute__ ((__nonnull__ params)) 2025-05-07T19:46:58.0212772Z #define __nv_pure__ __location__(nv_pure) 2025-05-07T19:46:58.0212872Z #define __off64_t_defined 2025-05-07T19:46:58.0212966Z #define __off_t_defined 2025-05-07T19:46:58.0213054Z #define __pic__ 2 2025-05-07T19:46:58.0213139Z #define __pid_t_defined 2025-05-07T19:46:58.0213230Z #define __pie__ 2 2025-05-07T19:46:58.0213329Z #define __private_extern__ extern 2025-05-07T19:46:58.0213414Z #define __ptr_t void * 2025-05-07T19:46:58.0213495Z #define __ptrvalue 2025-05-07T19:46:58.0213595Z #define __restrict_arr 2025-05-07T19:46:58.0213730Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:46:58.0213860Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:46:58.0213977Z #define __shared__ __location__(shared) 2025-05-07T19:46:58.0214076Z #define __sigset_t_defined 2025-05-07T19:46:58.0214184Z #define __specialization_static 2025-05-07T19:46:58.0214276Z #define __ssize_t_defined 2025-05-07T19:46:58.0214373Z #define __stub_bdflush 2025-05-07T19:46:58.0214463Z #define __stub_chflags 2025-05-07T19:46:58.0214547Z #define __stub_fattach 2025-05-07T19:46:58.0214656Z #define __stub_fchflags 2025-05-07T19:46:58.0214739Z #define __stub_fdetach 2025-05-07T19:46:58.0214824Z #define __stub_getmsg 2025-05-07T19:46:58.0214912Z #define __stub_gtty 2025-05-07T19:46:58.0215015Z #define __stub_lchmod 2025-05-07T19:46:58.0215216Z #define __stub_putmsg 2025-05-07T19:46:58.0215293Z #define __stub_revoke 2025-05-07T19:46:58.0215460Z #define __stub_setlogin 2025-05-07T19:46:58.0215542Z #define __stub_sigreturn 2025-05-07T19:46:58.0215620Z #define __stub_sstk 2025-05-07T19:46:58.0215693Z #define __stub_stty 2025-05-07T19:46:58.0215801Z #define __suseconds_t_defined 2025-05-07T19:46:58.0215882Z #define __thread__ __thread 2025-05-07T19:46:58.0215979Z #define __throw_exception_again throw 2025-05-07T19:46:58.0216095Z #define __time_t_defined 1 2025-05-07T19:46:58.0216176Z #define __timer_t_defined 1 2025-05-07T19:46:58.0216265Z #define __timespec_defined 1 2025-05-07T19:46:58.0216353Z #define __toascii(c) ((c) & 0x7f) 2025-05-07T19:46:58.0216491Z #define __toascii_l(c,l) ((l), __toascii (c)) 2025-05-07T19:46:58.0217021Z #define __tobody(c,f,a,args) (__extension__ ({ int __res; if (sizeof (c) > 1) { if (__builtin_constant_p (c)) { int __c = (c); __res = __c < -128 || __c > 255 ? __c : (a)[__c]; } else __res = f args; } else __res = (a)[(int) (c)]; __res; })) 2025-05-07T19:46:58.0217100Z #define __try try 2025-05-07T19:46:58.0217200Z #define __tune_k8__ 1 2025-05-07T19:46:58.0217285Z #define __u_char_defined 2025-05-07T19:46:58.0217534Z #define __u_intN_t(N,MODE) typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:58.0217637Z #define __uid_t_defined 2025-05-07T19:46:58.0217713Z #define __unbounded 2025-05-07T19:46:58.0217799Z #define __unix 1 2025-05-07T19:46:58.0217886Z #define __unix__ 1 2025-05-07T19:46:58.0218008Z #define __useconds_t_defined 2025-05-07T19:46:58.0218101Z #define __warnattr(msg) 2025-05-07T19:46:58.0218246Z #define __warndecl(name,msg) extern void name (void) 2025-05-07T19:46:58.0218361Z #define __wur 2025-05-07T19:46:58.0218499Z #define __x86_64 1 2025-05-07T19:46:58.0218591Z #define __x86_64__ 1 2025-05-07T19:46:58.0218768Z #define _tolower(c) ((int) (*__ctype_tolower_loc ())[(int) (c)]) 2025-05-07T19:46:58.0218971Z #define _toupper(c) ((int) (*__ctype_toupper_loc ())[(int) (c)]) 2025-05-07T19:46:58.0219094Z #define alloca(size) __builtin_alloca (size) 2025-05-07T19:46:58.0219444Z #define assert(expr) ((expr) ? __ASSERT_VOID_CAST (0) : __assert_fail (__STRING(expr), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:58.0219874Z #define assert_perror(errnum) (!(errnum) ? __ASSERT_VOID_CAST (0) : __assert_perror_fail ((errnum), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:58.0219981Z #define be16toh(x) __bswap_16 (x) 2025-05-07T19:46:58.0220085Z #define be32toh(x) __bswap_32 (x) 2025-05-07T19:46:58.0220210Z #define be64toh(x) __bswap_64 (x) 2025-05-07T19:46:58.0220326Z #define cudaArrayColorAttachment 0x20 2025-05-07T19:46:58.0220432Z #define cudaArrayCubemap 0x04 2025-05-07T19:46:58.0220536Z #define cudaArrayDefault 0x00 2025-05-07T19:46:58.0220674Z #define cudaArrayDeferredMapping 0x80 2025-05-07T19:46:58.0220777Z #define cudaArrayLayered 0x01 2025-05-07T19:46:58.0220882Z #define cudaArraySparse 0x40 2025-05-07T19:46:58.0221060Z #define cudaArraySparsePropertiesSingleMipTail 0x1 2025-05-07T19:46:58.0221178Z #define cudaArraySurfaceLoadStore 0x02 2025-05-07T19:46:58.0221292Z #define cudaArrayTextureGather 0x08 2025-05-07T19:46:58.0221471Z #define cudaCooperativeLaunchMultiDeviceNoPostSync 0x02 2025-05-07T19:46:58.0221672Z #define cudaCooperativeLaunchMultiDeviceNoPreSync 0x01 2025-05-07T19:46:58.0221783Z #define cudaCpuDeviceId ((int)-1) 2025-05-07T19:46:58.0221894Z #define cudaDeviceBlockingSync 0x04 2025-05-07T19:46:58.0222034Z #define cudaDeviceLmemResizeToMax 0x10 2025-05-07T19:46:58.0222142Z #define cudaDeviceMapHost 0x08 2025-05-07T19:46:58.0222242Z #define cudaDeviceMask 0xff 2025-05-07T19:46:58.0222545Z #define cudaDeviceScheduleAuto 0x00 2025-05-07T19:46:58.0222688Z #define cudaDeviceScheduleBlockingSync 0x04 2025-05-07T19:46:58.0222803Z #define cudaDeviceScheduleMask 0x07 2025-05-07T19:46:58.0222920Z #define cudaDeviceScheduleSpin 0x01 2025-05-07T19:46:58.0223063Z #define cudaDeviceScheduleYield 0x02 2025-05-07T19:46:58.0223176Z #define cudaDeviceSyncMemops 0x80 2025-05-07T19:46:58.0223295Z #define cudaEventBlockingSync 0x01 2025-05-07T19:46:58.0223956Z #define cudaEventDefault 0x00 2025-05-07T19:46:58.0224076Z #define cudaEventDisableTiming 0x02 2025-05-07T19:46:58.0224192Z #define cudaEventInterprocess 0x04 2025-05-07T19:46:58.0224311Z #define cudaEventRecordDefault 0x00 2025-05-07T19:46:58.0224462Z #define cudaEventRecordExternal 0x01 2025-05-07T19:46:58.0224575Z #define cudaEventWaitDefault 0x00 2025-05-07T19:46:58.0224691Z #define cudaEventWaitExternal 0x01 2025-05-07T19:46:58.0224847Z #define cudaExternalMemoryDedicated 0x1 2025-05-07T19:46:58.0225059Z #define cudaExternalSemaphoreSignalSkipNvSciBufMemSync 0x01 2025-05-07T19:46:58.0225255Z #define cudaExternalSemaphoreWaitSkipNvSciBufMemSync 0x02 2025-05-07T19:46:58.0225446Z #define cudaGetDeviceProperties cudaGetDeviceProperties_v2 2025-05-07T19:46:58.0225597Z #define cudaGraphKernelNodePortDefault 0 2025-05-07T19:46:58.0225759Z #define cudaGraphKernelNodePortLaunchCompletion 2 2025-05-07T19:46:58.0225902Z #define cudaGraphKernelNodePortProgrammatic 1 2025-05-07T19:46:58.0226042Z #define cudaHostAllocDefault 0x00 2025-05-07T19:46:58.0226155Z #define cudaHostAllocMapped 0x02 2025-05-07T19:46:58.0226268Z #define cudaHostAllocPortable 0x01 2025-05-07T19:46:58.0226389Z #define cudaHostAllocWriteCombined 0x04 2025-05-07T19:46:58.0226527Z #define cudaHostRegisterDefault 0x00 2025-05-07T19:46:58.0226646Z #define cudaHostRegisterIoMemory 0x04 2025-05-07T19:46:58.0226762Z #define cudaHostRegisterMapped 0x02 2025-05-07T19:46:58.0226904Z #define cudaHostRegisterPortable 0x01 2025-05-07T19:46:58.0227021Z #define cudaHostRegisterReadOnly 0x08 2025-05-07T19:46:58.0227147Z #define cudaInitDeviceFlagsAreValid 0x01 2025-05-07T19:46:58.0227337Z #define cudaInvalidDeviceId ((int)-2) 2025-05-07T19:46:58.0227472Z #define cudaIpcMemLazyEnablePeerAccess 0x01 2025-05-07T19:46:58.0227626Z #define cudaKernelNodeAttrID cudaLaunchAttributeID 2025-05-07T19:46:58.0227804Z #define cudaKernelNodeAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:58.0228168Z #define cudaKernelNodeAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:58.0228486Z #define cudaKernelNodeAttributeClusterDimension cudaLaunchAttributeClusterDimension 2025-05-07T19:46:58.0228999Z #define cudaKernelNodeAttributeClusterSchedulingPolicyPreference cudaLaunchAttributeClusterSchedulingPolicyPreference 2025-05-07T19:46:58.0229288Z #define cudaKernelNodeAttributeCooperative cudaLaunchAttributeCooperative 2025-05-07T19:46:58.0229708Z #define cudaKernelNodeAttributeDeviceUpdatableKernelNode cudaLaunchAttributeDeviceUpdatableKernelNode 2025-05-07T19:46:58.0229988Z #define cudaKernelNodeAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:58.0230328Z #define cudaKernelNodeAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:58.0230792Z #define cudaKernelNodeAttributePreferredSharedMemoryCarveout cudaLaunchAttributePreferredSharedMemoryCarveout 2025-05-07T19:46:58.0231027Z #define cudaKernelNodeAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:58.0231164Z #define cudaMemAttachGlobal 0x01 2025-05-07T19:46:58.0231273Z #define cudaMemAttachHost 0x02 2025-05-07T19:46:58.0231382Z #define cudaMemAttachSingle 0x04 2025-05-07T19:46:58.0231530Z #define cudaMemPoolCreateUsageHwDecompress 0x2 2025-05-07T19:46:58.0231666Z #define cudaNvSciSyncAttrSignal 0x1 2025-05-07T19:46:58.0231777Z #define cudaNvSciSyncAttrWait 0x2 2025-05-07T19:46:58.0231890Z #define cudaOccupancyDefault 0x00 2025-05-07T19:46:58.0232070Z #define cudaOccupancyDisableCachingOverride 0x01 2025-05-07T19:46:58.0232188Z #define cudaPeerAccessDefault 0x00 2025-05-07T19:46:58.0232546Z #define cudaSignalExternalSemaphoresAsync __CUDART_API_PTSZ(cudaSignalExternalSemaphoresAsync_v2) 2025-05-07T19:46:58.0232689Z #define cudaStreamAttrID cudaLaunchAttributeID 2025-05-07T19:46:58.0232876Z #define cudaStreamAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:58.0233192Z #define cudaStreamAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:58.0233454Z #define cudaStreamAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:58.0233828Z #define cudaStreamAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:58.0234041Z #define cudaStreamAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:58.0234494Z #define cudaStreamAttributeSynchronizationPolicy cudaLaunchAttributeSynchronizationPolicy 2025-05-07T19:46:58.0234621Z #define cudaStreamDefault 0x00 2025-05-07T19:46:58.0234768Z #define cudaStreamFireAndForget ((cudaStream_t)0x4) 2025-05-07T19:46:58.0235026Z #define cudaStreamGetCaptureInfo __CUDART_API_PTSZ(cudaStreamGetCaptureInfo_v2) 2025-05-07T19:46:58.0235268Z #define cudaStreamGraphFireAndForget (cudaStream_t)0x0200000000000000 2025-05-07T19:46:58.0235529Z #define cudaStreamGraphFireAndForgetAsSibling (cudaStream_t)0x0300000000000000 2025-05-07T19:46:58.0235727Z #define cudaStreamGraphTailLaunch (cudaStream_t)0x0100000000000000 2025-05-07T19:46:58.0235852Z #define cudaStreamLegacy ((cudaStream_t)0x1) 2025-05-07T19:46:58.0235990Z #define cudaStreamNonBlocking 0x01 2025-05-07T19:46:58.0236129Z #define cudaStreamPerThread ((cudaStream_t)0x2) 2025-05-07T19:46:58.0236264Z #define cudaStreamTailLaunch ((cudaStream_t)0x3) 2025-05-07T19:46:58.0236395Z #define cudaSurfaceType1D 0x01 2025-05-07T19:46:58.0236510Z #define cudaSurfaceType1DLayered 0xF1 2025-05-07T19:46:58.0236617Z #define cudaSurfaceType2D 0x02 2025-05-07T19:46:58.0236732Z #define cudaSurfaceType2DLayered 0xF2 2025-05-07T19:46:58.0236863Z #define cudaSurfaceType3D 0x03 2025-05-07T19:46:58.0236981Z #define cudaSurfaceTypeCubemap 0x0C 2025-05-07T19:46:58.0237110Z #define cudaSurfaceTypeCubemapLayered 0xFC 2025-05-07T19:46:58.0237240Z #define cudaTextureType1D 0x01 2025-05-07T19:46:58.0237402Z #define cudaTextureType1DLayered 0xF1 2025-05-07T19:46:58.0237508Z #define cudaTextureType2D 0x02 2025-05-07T19:46:58.0237625Z #define cudaTextureType2DLayered 0xF2 2025-05-07T19:46:58.0237751Z #define cudaTextureType3D 0x03 2025-05-07T19:46:58.0237867Z #define cudaTextureTypeCubemap 0x0C 2025-05-07T19:46:58.0237995Z #define cudaTextureTypeCubemapLayered 0xFC 2025-05-07T19:46:58.0238341Z #define cudaWaitExternalSemaphoresAsync __CUDART_API_PTSZ(cudaWaitExternalSemaphoresAsync_v2) 2025-05-07T19:46:58.0238443Z #define getc(_fp) _IO_getc (_fp) 2025-05-07T19:46:58.0238547Z #define htobe16(x) __bswap_16 (x) 2025-05-07T19:46:58.0238673Z #define htobe32(x) __bswap_32 (x) 2025-05-07T19:46:58.0238770Z #define htobe64(x) __bswap_64 (x) 2025-05-07T19:46:58.0238861Z #define htole16(x) (x) 2025-05-07T19:46:58.0238954Z #define htole32(x) (x) 2025-05-07T19:46:58.0239070Z #define htole64(x) (x) 2025-05-07T19:46:58.0239194Z #define isalnum_l(c,l) __isalnum_l ((c), (l)) 2025-05-07T19:46:58.0239316Z #define isalpha_l(c,l) __isalpha_l ((c), (l)) 2025-05-07T19:46:58.0239438Z #define isascii(c) __isascii (c) 2025-05-07T19:46:58.0239543Z #define isascii_l(c,l) __isascii_l ((c), (l)) 2025-05-07T19:46:58.0239648Z #define isblank_l(c,l) __isblank_l ((c), (l)) 2025-05-07T19:46:58.0239758Z #define iscntrl_l(c,l) __iscntrl_l ((c), (l)) 2025-05-07T19:46:58.0239884Z #define isdigit_l(c,l) __isdigit_l ((c), (l)) 2025-05-07T19:46:58.0239992Z #define isgraph_l(c,l) __isgraph_l ((c), (l)) 2025-05-07T19:46:58.0240096Z #define islower_l(c,l) __islower_l ((c), (l)) 2025-05-07T19:46:58.0240215Z #define isprint_l(c,l) __isprint_l ((c), (l)) 2025-05-07T19:46:58.0240318Z #define ispunct_l(c,l) __ispunct_l ((c), (l)) 2025-05-07T19:46:58.0240425Z #define isspace_l(c,l) __isspace_l ((c), (l)) 2025-05-07T19:46:58.0240531Z #define isupper_l(c,l) __isupper_l ((c), (l)) 2025-05-07T19:46:58.0240657Z #define isxdigit_l(c,l) __isxdigit_l ((c), (l)) 2025-05-07T19:46:58.0240738Z #define le16toh(x) (x) 2025-05-07T19:46:58.0240825Z #define le32toh(x) (x) 2025-05-07T19:46:58.0240919Z #define le64toh(x) (x) 2025-05-07T19:46:58.0241005Z #define linux 1 2025-05-07T19:46:58.0241103Z #define major(dev) gnu_dev_major (dev) 2025-05-07T19:46:58.0241243Z #define makedev(maj,min) gnu_dev_makedev (maj, min) 2025-05-07T19:46:58.0241386Z #define math_errhandling (MATH_ERRNO | MATH_ERREXCEPT) 2025-05-07T19:46:58.0241541Z #define minor(dev) gnu_dev_minor (dev) 2025-05-07T19:46:58.0241655Z #define offsetof(t,d) __builtin_offsetof(t, d) 2025-05-07T19:46:58.0241769Z #define putc(_ch,_fp) _IO_putc (_ch, _fp) 2025-05-07T19:46:58.0241855Z #define stderr stderr 2025-05-07T19:46:58.0241934Z #define stdin stdin 2025-05-07T19:46:58.0242024Z #define stdout stdout 2025-05-07T19:46:58.0242578Z #define strdupa(s) (__extension__ ({ const char *__old = (s); size_t __len = strlen (__old) + 1; char *__new = (char *) __builtin_alloca (__len); (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:58.0243301Z #define strndupa(s,n) (__extension__ ({ const char *__old = (s); size_t __len = strnlen (__old, (n)); char *__new = (char *) __builtin_alloca (__len + 1); __new[__len] = '\0'; (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:58.0243412Z #define toascii(c) __toascii (c) 2025-05-07T19:46:58.0243559Z #define toascii_l(c,l) __toascii_l ((c), (l)) 2025-05-07T19:46:58.0243641Z #define unix 1 2025-05-07T19:46:58.0243783Z #define w_coredump __wait_terminated.__w_coredump 2025-05-07T19:46:58.0243922Z #define w_retcode __wait_terminated.__w_retcode 2025-05-07T19:46:58.0244037Z #define w_stopsig __wait_stopped.__w_stopsig 2025-05-07T19:46:58.0244150Z #define w_stopval __wait_stopped.__w_stopval 2025-05-07T19:46:58.0244290Z #define w_termsig __wait_terminated.__w_termsig 2025-05-07T19:46:58.0244298Z 2025-05-07T19:46:58.0311934Z 2025-05-07T19:46:58.0312557Z + conda run -n build_binary nvcc --version 2025-05-07T19:46:58.0312584Z 2025-05-07T19:46:59.8567708Z nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:46:59.8568117Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:46:59.8568798Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:46:59.8569157Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:46:59.8569554Z Build cuda_12.8.r12.8/compiler.35404655_0 2025-05-07T19:46:59.8569780Z 2025-05-07T19:46:59.9146631Z 2025-05-07T19:46:59.9156289Z which: no nvidia-smi in (CONDA=/github/home/miniconda:/github/home/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:46:59.9158461Z [CHECK] nvidia-smi not found 2025-05-07T19:46:59.9159389Z [INSTALL] Successfully installed CUDA 12.8.0 2025-05-07T19:46:59.9248000Z ##[group]Run . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:59.9248627Z . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:59.9249256Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:46:59.9249605Z env: 2025-05-07T19:46:59.9249851Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:46:59.9250166Z BUILD_ENV: build_binary 2025-05-07T19:46:59.9250438Z BUILD_TARGET: genai 2025-05-07T19:46:59.9250674Z BUILD_VARIANT: cuda 2025-05-07T19:46:59.9250934Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:46:59.9251187Z ##[endgroup] 2025-05-07T19:47:00.3988353Z ################################################################################ 2025-05-07T19:47:00.3989062Z # Install PyTorch (PIP) 2025-05-07T19:47:00.3989337Z # 2025-05-07T19:47:00.4003215Z # [2025-05-07T19:47:00.399Z] + install_pytorch_pip build_binary nightly cuda/12.8.0 2025-05-07T19:47:00.4003958Z ################################################################################ 2025-05-07T19:47:00.4004206Z 2025-05-07T19:47:00.4031466Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y numpy 2025-05-07T19:47:01.3300213Z Channels: 2025-05-07T19:47:01.3300577Z - conda-forge 2025-05-07T19:47:01.3300909Z Platform: linux-64 2025-05-07T19:47:04.3682659Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:47:06.0826911Z Solving environment: \ | / - done 2025-05-07T19:47:06.3966138Z 2025-05-07T19:47:06.3966423Z ## Package Plan ## 2025-05-07T19:47:06.3966681Z 2025-05-07T19:47:06.3966911Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:47:06.3967404Z 2025-05-07T19:47:06.3967518Z added / updated specs: 2025-05-07T19:47:06.3968130Z - numpy 2025-05-07T19:47:06.3968268Z 2025-05-07T19:47:06.3968274Z 2025-05-07T19:47:06.3968442Z The following packages will be downloaded: 2025-05-07T19:47:06.3968682Z 2025-05-07T19:47:06.3968832Z package | build 2025-05-07T19:47:06.3969233Z ---------------------------|----------------- 2025-05-07T19:47:06.3969661Z libblas-3.9.0 |31_h59b9bed_openblas 16 KB conda-forge 2025-05-07T19:47:06.3970199Z libcblas-3.9.0 |31_he106b2a_openblas 16 KB conda-forge 2025-05-07T19:47:06.3970696Z liblapack-3.9.0 |31_h7ac8fdf_openblas 16 KB conda-forge 2025-05-07T19:47:06.3971204Z numpy-2.2.5 | py311h5d046bc_0 8.6 MB conda-forge 2025-05-07T19:47:06.3971664Z ------------------------------------------------------------ 2025-05-07T19:47:06.3972040Z Total: 8.7 MB 2025-05-07T19:47:06.3972305Z 2025-05-07T19:47:06.3972457Z The following NEW packages will be INSTALLED: 2025-05-07T19:47:06.3972701Z 2025-05-07T19:47:06.3972956Z libblas conda-forge/linux-64::libblas-3.9.0-31_h59b9bed_openblas 2025-05-07T19:47:06.3973545Z libcblas conda-forge/linux-64::libcblas-3.9.0-31_he106b2a_openblas 2025-05-07T19:47:06.3974140Z liblapack conda-forge/linux-64::liblapack-3.9.0-31_h7ac8fdf_openblas 2025-05-07T19:47:06.3974672Z numpy conda-forge/linux-64::numpy-2.2.5-py311h5d046bc_0 2025-05-07T19:47:06.3974996Z 2025-05-07T19:47:06.3975000Z 2025-05-07T19:47:06.3975004Z 2025-05-07T19:47:06.3975161Z Downloading and Extracting Packages: ...working... 2025-05-07T19:47:06.3975599Z numpy-2.2.5 | 8.6 MB | | 0% 2025-05-07T19:47:06.3975845Z 2025-05-07T19:47:06.3976248Z libblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:06.3976509Z 2025-05-07T19:47:06.3976543Z 2025-05-07T19:47:06.3977877Z libcblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:06.3978162Z 2025-05-07T19:47:06.3978166Z 2025-05-07T19:47:06.3978172Z 2025-05-07T19:47:06.6980102Z liblapack-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:06.6980461Z 2025-05-07T19:47:06.6980817Z 2025-05-07T19:47:06.6980823Z 2025-05-07T19:47:06.6994367Z liblapack-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:47:06.6995096Z 2025-05-07T19:47:06.6995111Z 2025-05-07T19:47:06.6995156Z 2025-05-07T19:47:06.7315529Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:06.7315865Z 2025-05-07T19:47:06.7316246Z 2025-05-07T19:47:06.7316268Z 2025-05-07T19:47:06.7602756Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:06.7603109Z 2025-05-07T19:47:06.7603116Z 2025-05-07T19:47:06.7603424Z libcblas-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:47:06.7603738Z 2025-05-07T19:47:06.7603745Z 2025-05-07T19:47:06.7615580Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:06.7615959Z 2025-05-07T19:47:06.7623784Z libblas-3.9.0 | 16 KB | #########7 | 97%  2025-05-07T19:47:06.7624089Z 2025-05-07T19:47:06.7991759Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:06.7992070Z 2025-05-07T19:47:06.7992286Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:06.7992537Z 2025-05-07T19:47:06.7992541Z 2025-05-07T19:47:06.8098557Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:06.8744323Z numpy-2.2.5 | 8.6 MB | | 0% 2025-05-07T19:47:07.2551479Z numpy-2.2.5 | 8.6 MB | ########## | 100% 2025-05-07T19:47:07.2552892Z numpy-2.2.5 | 8.6 MB | ########## | 100% 2025-05-07T19:47:07.2556320Z numpy-2.2.5 | 8.6 MB | ########## | 100% 2025-05-07T19:47:07.2556741Z 2025-05-07T19:47:07.2556976Z 2025-05-07T19:47:07.2557364Z  2025-05-07T19:47:07.2557938Z 2025-05-07T19:47:07.2557944Z 2025-05-07T19:47:07.2558157Z  2025-05-07T19:47:07.2558388Z 2025-05-07T19:47:07.2558407Z 2025-05-07T19:47:07.2558444Z 2025-05-07T19:47:07.2558686Z  done 2025-05-07T19:47:07.3563862Z Preparing transaction: | done 2025-05-07T19:47:07.5575911Z Verifying transaction: - \ done 2025-05-07T19:47:07.6589966Z Executing transaction: / done 2025-05-07T19:47:07.7651737Z ################################################################################ 2025-05-07T19:47:07.7652320Z # Install Package From PyTorch PIP: torch 2025-05-07T19:47:07.7652680Z # 2025-05-07T19:47:07.7671022Z # [2025-05-07T19:47:07.766Z] + install_from_pytorch_pip build_binary torch nightly cuda/12.8.0 2025-05-07T19:47:07.7672542Z ################################################################################ 2025-05-07T19:47:07.7673295Z 2025-05-07T19:47:07.7699723Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:47:07.8564058Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:47:07.8565751Z ################################################################################ 2025-05-07T19:47:07.8566815Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:47:07.8568035Z # 2025-05-07T19:47:07.8589008Z # [2025-05-07T19:47:07.858Z] + __prepare_pip_arguments torch nightly cuda/12.8.0 2025-05-07T19:47:07.8589525Z ################################################################################ 2025-05-07T19:47:07.8589814Z 2025-05-07T19:47:07.8609070Z [INSTALL] Extracted package (channel, version): (nightly, LATEST) 2025-05-07T19:47:07.8637368Z [INSTALL] Extracted package variant: cu128 2025-05-07T19:47:07.8650131Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:47:07.8651736Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:47:07.8655264Z [INSTALL] Extracted the full PIP package: --pre torch 2025-05-07T19:47:07.8665202Z [INSTALL] Attempting to install [torch, LATEST] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/cu128/ ... 2025-05-07T19:47:07.8687715Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:49:02.5180147Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:49:02.5184254Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:49:02.5184730Z Collecting torch 2025-05-07T19:49:02.5185451Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-05-07T19:49:02.5186323Z Collecting filelock (from torch) 2025-05-07T19:49:02.5186917Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.16.1-py3-none-any.whl (16 kB) 2025-05-07T19:49:02.5187943Z Requirement already satisfied: typing-extensions>=4.10.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from torch) (4.13.2) 2025-05-07T19:49:02.5188760Z Collecting sympy>=1.13.3 (from torch) 2025-05-07T19:49:02.5189306Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.13.3-py3-none-any.whl (6.2 MB) 2025-05-07T19:49:02.5190291Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 29.0 MB/s eta 0:00:00 2025-05-07T19:49:02.5190712Z Collecting networkx (from torch) 2025-05-07T19:49:02.5191256Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.4.2-py3-none-any.whl (1.7 MB) 2025-05-07T19:49:02.5192007Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 12.7 MB/s eta 0:00:00 2025-05-07T19:49:02.5193121Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from torch) (3.1.6) 2025-05-07T19:49:02.5193865Z Collecting fsspec (from torch) 2025-05-07T19:49:02.5194428Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2024.10.0-py3-none-any.whl (179 kB) 2025-05-07T19:49:02.5195090Z Collecting nvidia-cuda-nvrtc-cu12==12.8.61 (from torch) 2025-05-07T19:49:02.5195994Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:49:02.5196889Z Collecting nvidia-cuda-runtime-cu12==12.8.57 (from torch) 2025-05-07T19:49:02.5197803Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:49:02.5198691Z Collecting nvidia-cuda-cupti-cu12==12.8.57 (from torch) 2025-05-07T19:49:02.5199591Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:49:02.5200473Z Collecting nvidia-cudnn-cu12==9.8.0.87 (from torch) 2025-05-07T19:49:02.5201225Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-05-07T19:49:02.5201997Z Collecting nvidia-cublas-cu12==12.8.3.14 (from torch) 2025-05-07T19:49:02.5202848Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:49:02.5203664Z Collecting nvidia-cufft-cu12==11.3.3.41 (from torch) 2025-05-07T19:49:02.5204532Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:49:02.5205397Z Collecting nvidia-curand-cu12==10.3.9.55 (from torch) 2025-05-07T19:49:02.5206189Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:49:02.5207144Z Collecting nvidia-cusolver-cu12==11.7.2.55 (from torch) 2025-05-07T19:49:02.5207950Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:49:02.5208749Z Collecting nvidia-cusparse-cu12==12.5.7.53 (from torch) 2025-05-07T19:49:02.5209671Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:49:02.5210687Z Collecting nvidia-cusparselt-cu12==0.6.3 (from torch) 2025-05-07T19:49:02.5211449Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB) 2025-05-07T19:49:02.5212208Z Collecting nvidia-nccl-cu12==2.26.2 (from torch) 2025-05-07T19:49:02.5213040Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-05-07T19:49:02.5213856Z Collecting nvidia-nvtx-cu12==12.8.55 (from torch) 2025-05-07T19:49:02.5214686Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:49:02.5215522Z Collecting nvidia-nvjitlink-cu12==12.8.61 (from torch) 2025-05-07T19:49:02.5216393Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:49:02.5217142Z 2025-05-07T19:49:02.5217304Z Collecting nvidia-cufile-cu12==1.13.0.11 (from torch) 2025-05-07T19:49:02.5218141Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:49:02.5219201Z Collecting pytorch-triton==3.3.0+git96316ce5 (from torch) 2025-05-07T19:49:02.5220060Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:49:02.5221352Z Requirement already satisfied: setuptools>=40.8.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from pytorch-triton==3.3.0+git96316ce5->torch) (78.1.1) 2025-05-07T19:49:02.5222229Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-05-07T19:49:02.5222770Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-05-07T19:49:02.5223444Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 3.1 MB/s eta 0:00:00 2025-05-07T19:49:02.5224193Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from jinja2->torch) (3.0.2) 2025-05-07T19:49:02.5225280Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp311-cp311-manylinux_2_28_x86_64.whl (1047.1 MB) 2025-05-07T19:49:02.5226103Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 22.7 MB/s eta 0:00:00 2025-05-07T19:49:02.5226784Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl (609.6 MB) 2025-05-07T19:49:02.5227576Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 609.6/609.6 MB 37.9 MB/s eta 0:00:00 2025-05-07T19:49:02.5228366Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-05-07T19:49:02.5229233Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 47.8 MB/s eta 0:00:00 2025-05-07T19:49:02.5230011Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-05-07T19:49:02.5230876Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 79.2 MB/s eta 0:00:00 2025-05-07T19:49:02.5231741Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-05-07T19:49:02.5232622Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 8.0 MB/s eta 0:00:00 2025-05-07T19:49:02.5233317Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl (698.0 MB) 2025-05-07T19:49:02.5234139Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 698.0/698.0 MB 30.7 MB/s eta 0:00:00 2025-05-07T19:49:02.5234924Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-05-07T19:49:02.5235831Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 71.9 MB/s eta 0:00:00 2025-05-07T19:49:02.5236640Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-05-07T19:49:02.5237529Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 31.2 MB/s eta 0:00:00 2025-05-07T19:49:02.5238246Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-05-07T19:49:02.5259411Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 81.2 MB/s eta 0:00:00 2025-05-07T19:49:02.5260207Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl (260.4 MB) 2025-05-07T19:49:02.5261273Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.4/260.4 MB 65.0 MB/s eta 0:00:00 2025-05-07T19:49:02.5262164Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (292.1 MB) 2025-05-07T19:49:02.5263119Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.1/292.1 MB 63.0 MB/s eta 0:00:00 2025-05-07T19:49:02.5264077Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-05-07T19:49:02.5264949Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 60.3 MB/s eta 0:00:00 2025-05-07T19:49:02.5265800Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-05-07T19:49:02.5266756Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 59.8 MB/s eta 0:00:00 2025-05-07T19:49:02.5267970Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.2 MB) 2025-05-07T19:49:02.5268966Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.2/39.2 MB 63.4 MB/s eta 0:00:00 2025-05-07T19:49:02.5269797Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-05-07T19:49:02.5271295Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (153.5 MB) 2025-05-07T19:49:02.5272313Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 153.5/153.5 MB 65.0 MB/s eta 0:00:00 2025-05-07T19:49:02.5274171Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, sympy, pytorch-triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch 2025-05-07T19:49:02.5275898Z 2025-05-07T19:49:02.5278033Z Successfully installed filelock-3.16.1 fsspec-2024.10.0 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.8.3.14 nvidia-cuda-cupti-cu12-12.8.57 nvidia-cuda-nvrtc-cu12-12.8.61 nvidia-cuda-runtime-cu12-12.8.57 nvidia-cudnn-cu12-9.8.0.87 nvidia-cufft-cu12-11.3.3.41 nvidia-cufile-cu12-1.13.0.11 nvidia-curand-cu12-10.3.9.55 nvidia-cusolver-cu12-11.7.2.55 nvidia-cusparse-cu12-12.5.7.53 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.8.61 nvidia-nvtx-cu12-12.8.55 pytorch-triton-3.3.0+git96316ce5 sympy-1.13.3 torch-2.8.0.dev20250507+cu128 2025-05-07T19:49:02.5280264Z 2025-05-07T19:49:04.7192414Z torch 2.8.0.dev20250507+cu128 2025-05-07T19:49:04.7196468Z [CHECK] The installed package [torch, nightly/LATEST] is the correct variant (cu128) 2025-05-07T19:49:08.1169843Z [CHECK] Python (sub-)package 'torch.distributed' found ... 2025-05-07T19:49:11.5116750Z [CHECK] NOTE: The installed version is: 2.8.0.dev20250507+cu128 2025-05-07T19:49:11.5117333Z [CHECK] NOTE: Checking _GLIBCXX_USE_CXX11_ABI ... 2025-05-07T19:49:14.8424511Z True 2025-05-07T19:49:14.8424859Z True 2025-05-07T19:49:14.8425001Z 2025-05-07T19:49:14.9005248Z [INSTALL] Successfully installed PyTorch through PyTorch PIP 2025-05-07T19:49:14.9084492Z ##[group]Run if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:49:14.9085219Z if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:49:14.9085929Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:14.9086285Z env: 2025-05-07T19:49:14.9086573Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:14.9086917Z BUILD_ENV: build_binary 2025-05-07T19:49:14.9087227Z BUILD_TARGET: genai 2025-05-07T19:49:14.9087484Z BUILD_VARIANT: cuda 2025-05-07T19:49:14.9087783Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:14.9088060Z ##[endgroup] 2025-05-07T19:49:15.3548230Z /github/home/miniconda/bin/conda 2025-05-07T19:49:15.3549243Z ################################################################################ 2025-05-07T19:49:15.3550525Z # Collect PyTorch Environment Information (for Reporting Issues) 2025-05-07T19:49:15.3551406Z # 2025-05-07T19:49:15.3563635Z # [2025-05-07T19:49:15.355Z] + collect_pytorch_env_info build_binary 2025-05-07T19:49:15.3564803Z ################################################################################ 2025-05-07T19:49:15.3565071Z 2025-05-07T19:49:15.3589685Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:15.4488264Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:15.4505359Z [INFO] Downloading the PyTorch environment info collection script ... 2025-05-07T19:49:15.4506104Z + wget -q https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py 2025-05-07T19:49:15.4506563Z 2025-05-07T19:49:15.5355618Z 2025-05-07T19:49:15.5356979Z [INFO] Collecting PyTorch environment info (will be needed for reporting issues to PyTorch) ... 2025-05-07T19:49:15.5382726Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary python collect_env.py 2025-05-07T19:49:21.1974651Z Collecting environment information... 2025-05-07T19:49:21.1975602Z PyTorch version: 2.8.0.dev20250507+cu128 2025-05-07T19:49:21.1976042Z Is debug build: False 2025-05-07T19:49:21.1976318Z CUDA used to build PyTorch: 12.8 2025-05-07T19:49:21.1976650Z ROCM used to build PyTorch: N/A 2025-05-07T19:49:21.1976857Z 2025-05-07T19:49:21.1976972Z OS: Amazon Linux 2023.7.20250428 (x86_64) 2025-05-07T19:49:21.1977322Z GCC version: Could not collect 2025-05-07T19:49:21.1977931Z Clang version: 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:49:21.1978587Z CMake version: version 4.0.2 2025-05-07T19:49:21.1978873Z Libc version: glibc-2.34 2025-05-07T19:49:21.1979064Z 2025-05-07T19:49:21.1979395Z Python version: 3.11.11 | packaged by conda-forge | (main, Mar 3 2025, 20:43:55) [GCC 13.3.0] (64-bit runtime) 2025-05-07T19:49:21.1980092Z Python platform: Linux-6.1.130-139.222.amzn2023.x86_64-x86_64-with-glibc2.34 2025-05-07T19:49:21.1980540Z Is CUDA available: False 2025-05-07T19:49:21.1980837Z CUDA runtime version: 12.8.61 2025-05-07T19:49:21.1981141Z CUDA_MODULE_LOADING set to: N/A 2025-05-07T19:49:21.1981505Z GPU models and configuration: Could not collect 2025-05-07T19:49:21.1981951Z Nvidia driver version: Could not collect 2025-05-07T19:49:21.1982275Z cuDNN version: Could not collect 2025-05-07T19:49:21.1982594Z HIP runtime version: N/A 2025-05-07T19:49:21.1982864Z MIOpen runtime version: N/A 2025-05-07T19:49:21.1983170Z Is XNNPACK available: True 2025-05-07T19:49:21.1983342Z 2025-05-07T19:49:21.1983428Z CPU: 2025-05-07T19:49:21.1983682Z Architecture: x86_64 2025-05-07T19:49:21.1984068Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:49:21.1984497Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:49:21.1984939Z Byte Order: Little Endian 2025-05-07T19:49:21.1985285Z CPU(s): 96 2025-05-07T19:49:21.1985627Z On-line CPU(s) list: 0-95 2025-05-07T19:49:21.1985971Z Vendor ID: GenuineIntel 2025-05-07T19:49:21.1986956Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:49:21.1987366Z CPU family: 6 2025-05-07T19:49:21.1987706Z Model: 85 2025-05-07T19:49:21.1988054Z Thread(s) per core: 2 2025-05-07T19:49:21.1988377Z Core(s) per socket: 24 2025-05-07T19:49:21.1988715Z Socket(s): 2 2025-05-07T19:49:21.1989137Z Stepping: 7 2025-05-07T19:49:21.1989477Z BogoMIPS: 6000.01 2025-05-07T19:49:21.1991895Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:49:21.1994382Z Hypervisor vendor: KVM 2025-05-07T19:49:21.1994715Z Virtualization type: full 2025-05-07T19:49:21.1995093Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:49:21.1995489Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:49:21.1995906Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:49:21.1996290Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:49:21.1996662Z NUMA node(s): 2 2025-05-07T19:49:21.1996986Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:49:21.1997364Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:49:21.1997868Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:49:21.1998445Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:49:21.1998979Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:49:21.1999591Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:49:21.2000208Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:49:21.2000855Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:49:21.2001474Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:49:21.2001880Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:49:21.2002256Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:49:21.2002765Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:49:21.2003528Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:49:21.2004428Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:49:21.2005128Z Vulnerability Srbds: Not affected 2025-05-07T19:49:21.2005519Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:49:21.2005772Z 2025-05-07T19:49:21.2005915Z Versions of relevant libraries: 2025-05-07T19:49:21.2006207Z [pip3] numpy==2.2.5 2025-05-07T19:49:21.2006495Z [pip3] nvidia-cublas-cu12==12.8.3.14 2025-05-07T19:49:21.2006823Z [pip3] nvidia-cuda-cupti-cu12==12.8.57 2025-05-07T19:49:21.2007182Z [pip3] nvidia-cuda-nvrtc-cu12==12.8.61 2025-05-07T19:49:21.2007511Z [pip3] nvidia-cuda-runtime-cu12==12.8.57 2025-05-07T19:49:21.2007867Z [pip3] nvidia-cudnn-cu12==9.8.0.87 2025-05-07T19:49:21.2008197Z [pip3] nvidia-cufft-cu12==11.3.3.41 2025-05-07T19:49:21.2008507Z [pip3] nvidia-curand-cu12==10.3.9.55 2025-05-07T19:49:21.2008958Z [pip3] nvidia-cusolver-cu12==11.7.2.55 2025-05-07T19:49:21.2009498Z [pip3] nvidia-cusparse-cu12==12.5.7.53 2025-05-07T19:49:21.2009843Z [pip3] nvidia-cusparselt-cu12==0.6.3 2025-05-07T19:49:21.2010134Z [pip3] nvidia-nccl-cu12==2.26.2 2025-05-07T19:49:21.2010445Z [pip3] nvidia-nvjitlink-cu12==12.8.61 2025-05-07T19:49:21.2010743Z [pip3] nvidia-nvtx-cu12==12.8.55 2025-05-07T19:49:21.2011053Z [pip3] pytorch-triton==3.3.0+git96316ce5 2025-05-07T19:49:21.2011370Z [pip3] torch==2.8.0.dev20250507+cu128 2025-05-07T19:49:21.2011773Z [conda] cuda-cudart 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:21.2012271Z [conda] cuda-cudart-dev 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:21.2012816Z [conda] cuda-cudart-dev_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:21.2013365Z [conda] cuda-cudart-static 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:21.2013904Z [conda] cuda-cudart-static_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:21.2014475Z [conda] cuda-cudart_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:21.2014959Z [conda] cuda-cupti 12.8.57 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2015458Z [conda] cuda-cupti-dev 12.8.57 h5888daf_0 conda-forge 2025-05-07T19:49:21.2015975Z [conda] cuda-libraries 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:49:21.2016479Z [conda] cuda-libraries-dev 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:49:21.2016995Z [conda] cuda-nvrtc 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2017465Z [conda] cuda-nvrtc-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:49:21.2017953Z [conda] cuda-nvtx 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2018418Z [conda] cuda-opencl 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2019119Z [conda] cuda-opencl-dev 12.8.55 h5888daf_0 conda-forge 2025-05-07T19:49:21.2019657Z [conda] cuda-runtime 12.8.0 ha804496_0 conda-forge 2025-05-07T19:49:21.2020150Z [conda] libcublas 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:49:21.2020674Z [conda] libcublas-dev 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:49:21.2021166Z [conda] libcufft 11.3.3.41 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2021680Z [conda] libcufft-dev 11.3.3.41 h5888daf_0 conda-forge 2025-05-07T19:49:21.2022195Z [conda] libcurand 10.3.9.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2022684Z [conda] libcurand-dev 10.3.9.55 h5888daf_0 conda-forge 2025-05-07T19:49:21.2023211Z [conda] libcusolver 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:49:21.2023720Z [conda] libcusolver-dev 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:49:21.2024262Z [conda] libcusparse 12.5.7.53 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2024773Z [conda] libcusparse-dev 12.5.7.53 h5888daf_0 conda-forge 2025-05-07T19:49:21.2025310Z [conda] libnvjitlink 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:49:21.2025853Z [conda] libnvjitlink-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:49:21.2026339Z [conda] numpy 2.2.5 py311h5d046bc_0 conda-forge 2025-05-07T19:49:21.2026852Z [conda] nvidia-cublas-cu12 12.8.3.14 pypi_0 pypi 2025-05-07T19:49:21.2027373Z [conda] nvidia-cuda-cupti-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:49:21.2027928Z [conda] nvidia-cuda-nvrtc-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:49:21.2028455Z [conda] nvidia-cuda-runtime-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:49:21.2029293Z [conda] nvidia-cudnn-cu12 9.8.0.87 pypi_0 pypi 2025-05-07T19:49:21.2029822Z [conda] nvidia-cufft-cu12 11.3.3.41 pypi_0 pypi 2025-05-07T19:49:21.2030323Z [conda] nvidia-curand-cu12 10.3.9.55 pypi_0 pypi 2025-05-07T19:49:21.2030875Z [conda] nvidia-cusolver-cu12 11.7.2.55 pypi_0 pypi 2025-05-07T19:49:21.2031392Z [conda] nvidia-cusparse-cu12 12.5.7.53 pypi_0 pypi 2025-05-07T19:49:21.2031942Z [conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi 2025-05-07T19:49:21.2032451Z [conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi 2025-05-07T19:49:21.2032983Z [conda] nvidia-nvjitlink-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:49:21.2033515Z [conda] nvidia-nvtx-cu12 12.8.55 pypi_0 pypi 2025-05-07T19:49:21.2034023Z [conda] pytorch-triton 3.3.0+git96316ce5 pypi_0 pypi 2025-05-07T19:49:21.2034539Z [conda] torch 2.8.0.dev20250507+cu128 pypi_0 pypi 2025-05-07T19:49:21.2034829Z 2025-05-07T19:49:21.2685812Z ##[group]Run . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:49:21.2686473Z . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:49:21.2687079Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:21.2687420Z env: 2025-05-07T19:49:21.2687646Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:21.2687966Z BUILD_ENV: build_binary 2025-05-07T19:49:21.2688212Z BUILD_TARGET: genai 2025-05-07T19:49:21.2688452Z BUILD_VARIANT: cuda 2025-05-07T19:49:21.2688702Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:21.2688970Z ##[endgroup] 2025-05-07T19:49:21.7078542Z ################################################################################ 2025-05-07T19:49:21.7079011Z # Install cuDNN 2025-05-07T19:49:21.7079252Z # 2025-05-07T19:49:21.7091656Z # [2025-05-07T19:49:21.708Z] + install_cudnn build_binary /__w/FBGEMM/FBGEMM/build_only/cudnn 12.8.0 2025-05-07T19:49:21.7092306Z ################################################################################ 2025-05-07T19:49:21.7092552Z 2025-05-07T19:49:21.7109977Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:21.7965044Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:21.7965524Z [INSTALL] cuda_concat_version is determined to be: 128 2025-05-07T19:49:21.7966143Z [INSTALL] Could not find cuDNN URL for the given cuda_concat_version 128; defaulting to cuDNN for CUDA 11.8 2025-05-07T19:49:21.7966775Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:21.7967187Z 2025-05-07T19:49:21.7984673Z 2025-05-07T19:49:21.7984973Z + mkdir -p /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:21.7985290Z 2025-05-07T19:49:21.8004398Z 2025-05-07T19:49:21.8028043Z [INSTALL] Downloading cuDNN to /tmp/tmp.ZMVKK9xYpa ... 2025-05-07T19:49:21.8054173Z [EXEC] [ATTEMPT 0/3] + wget -q https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz -O cudnn.tar.xz 2025-05-07T19:49:35.7129277Z [INSTALL] Unpacking cuDNN ... 2025-05-07T19:49:35.7130184Z + tar -xvf cudnn.tar.xz 2025-05-07T19:49:35.7130658Z 2025-05-07T19:49:35.7164700Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/ 2025-05-07T19:49:35.7165793Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/ 2025-05-07T19:49:35.7167599Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static.a 2025-05-07T19:49:38.1066686Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static_v8.a 2025-05-07T19:49:38.1067540Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static.a 2025-05-07T19:49:40.3668367Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static_v8.a 2025-05-07T19:49:40.3668973Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static.a 2025-05-07T19:49:48.5793350Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static_v8.a 2025-05-07T19:49:48.5794052Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static.a 2025-05-07T19:49:50.1751183Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static_v8.a 2025-05-07T19:49:50.1752871Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static.a 2025-05-07T19:49:51.8646123Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static_v8.a 2025-05-07T19:49:51.8647607Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static.a 2025-05-07T19:49:53.3751197Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static_v8.a 2025-05-07T19:49:53.3752783Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8 2025-05-07T19:49:53.3754075Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so 2025-05-07T19:49:53.3755420Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8.7.0 2025-05-07T19:49:53.3762426Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8 2025-05-07T19:49:53.3764248Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so 2025-05-07T19:49:53.3764854Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8.7.0 2025-05-07T19:49:55.7549299Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8 2025-05-07T19:49:55.7550191Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so 2025-05-07T19:49:55.7550906Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8.7.0 2025-05-07T19:49:58.0157946Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so 2025-05-07T19:49:58.0159538Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8 2025-05-07T19:49:58.0161063Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8.7.0 2025-05-07T19:50:06.5553793Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so 2025-05-07T19:50:06.5555436Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8.7.0 2025-05-07T19:50:08.1764255Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8 2025-05-07T19:50:08.1765891Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8.7.0 2025-05-07T19:50:09.8612584Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so 2025-05-07T19:50:09.8614211Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8 2025-05-07T19:50:09.8615757Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8.7.0 2025-05-07T19:50:11.3749191Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so 2025-05-07T19:50:11.3750817Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8 2025-05-07T19:50:11.3752143Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/ 2025-05-07T19:50:11.3753412Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_v8.h 2025-05-07T19:50:11.3754838Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer_v8.h 2025-05-07T19:50:11.3756407Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train_v8.h 2025-05-07T19:50:11.3757955Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend_v8.h 2025-05-07T19:50:11.3759098Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer_v8.h 2025-05-07T19:50:11.3759653Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train_v8.h 2025-05-07T19:50:11.3760183Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer_v8.h 2025-05-07T19:50:11.3760752Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train_v8.h 2025-05-07T19:50:11.3761307Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version_v8.h 2025-05-07T19:50:11.3761800Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn.h 2025-05-07T19:50:11.3762312Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer.h 2025-05-07T19:50:11.3762969Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train.h 2025-05-07T19:50:11.3763871Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend.h 2025-05-07T19:50:11.3764398Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer.h 2025-05-07T19:50:11.3764946Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train.h 2025-05-07T19:50:11.3765483Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer.h 2025-05-07T19:50:11.3765996Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train.h 2025-05-07T19:50:11.3766525Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version.h 2025-05-07T19:50:11.3766976Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/LICENSE 2025-05-07T19:50:11.3771118Z 2025-05-07T19:50:11.3771973Z [INSTALL] Moving cuDNN files to /__w/FBGEMM/FBGEMM/build_only/cudnn ... 2025-05-07T19:50:11.3772550Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:11.3772817Z 2025-05-07T19:50:11.3791979Z 2025-05-07T19:50:11.3792816Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:11.3793568Z 2025-05-07T19:50:11.3803153Z 2025-05-07T19:50:11.3804194Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:50:11.3805819Z 2025-05-07T19:50:11.3836434Z 2025-05-07T19:50:11.3836894Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:50:11.3837327Z 2025-05-07T19:50:12.7153667Z 2025-05-07T19:50:12.7154232Z /__w/FBGEMM/FBGEMM 2025-05-07T19:50:12.7155062Z + rm -rf /tmp/tmp.ZMVKK9xYpa 2025-05-07T19:50:12.7155601Z 2025-05-07T19:50:12.7690587Z 2025-05-07T19:50:12.7697089Z [INSTALL] Set environment variables CUDNN_INCLUDE_DIR and CUDNN_LIBRARY ... 2025-05-07T19:50:12.7698103Z + conda env config vars set -n build_binary CUDNN_INCLUDE_DIR=/__w/FBGEMM/FBGEMM/build_only/cudnn/include CUDNN_LIBRARY=/__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:12.7698778Z 2025-05-07T19:50:13.1806225Z 2025-05-07T19:50:13.1807114Z [INSTALL] Successfully installed cuDNN (for CUDA 12.8.0) 2025-05-07T19:50:13.1877725Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:50:13.1878405Z . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:50:13.1879069Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:50:13.1879446Z env: 2025-05-07T19:50:13.1879690Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:50:13.1880273Z BUILD_ENV: build_binary 2025-05-07T19:50:13.1880526Z BUILD_TARGET: genai 2025-05-07T19:50:13.1880787Z BUILD_VARIANT: cuda 2025-05-07T19:50:13.1881058Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:50:13.1881313Z ##[endgroup] 2025-05-07T19:50:13.6447393Z ################################################################################ 2025-05-07T19:50:13.6448521Z # Prepare FBGEMM-GPU Build 2025-05-07T19:50:13.6449242Z # 2025-05-07T19:50:13.6464287Z # [2025-05-07T19:50:13.645Z] + prepare_fbgemm_gpu_build build_binary 2025-05-07T19:50:13.6465305Z ################################################################################ 2025-05-07T19:50:13.6465640Z 2025-05-07T19:50:13.6478995Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:50:13.7329368Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:50:13.7346382Z [BUILD] Running git submodules update ... 2025-05-07T19:50:13.7376572Z [EXEC] [ATTEMPT 0/3] + git submodule sync 2025-05-07T19:50:13.7706031Z Synchronizing submodule url for '../external/asmjit' 2025-05-07T19:50:13.7706656Z Synchronizing submodule url for '../external/composable_kernel' 2025-05-07T19:50:13.7707179Z Synchronizing submodule url for '../external/cpuinfo' 2025-05-07T19:50:13.7707612Z Synchronizing submodule url for '../external/cutlass' 2025-05-07T19:50:13.7708076Z Synchronizing submodule url for '../external/googletest' 2025-05-07T19:50:13.7708540Z Synchronizing submodule url for '../external/hipify_torch' 2025-05-07T19:50:13.7708997Z Synchronizing submodule url for '../external/json' 2025-05-07T19:50:13.7743904Z [EXEC] [ATTEMPT 0/3] + git submodule update --init --recursive 2025-05-07T19:50:13.8214126Z [BUILD] Installing other build dependencies ... 2025-05-07T19:50:13.8237208Z [EXEC] [ATTEMPT 0/3] + conda run --no-capture-output -n build_binary python -m pip install -r requirements.txt 2025-05-07T19:50:15.9444626Z Collecting backports.tarfile (from -r requirements.txt (line 13)) 2025-05-07T19:50:15.9670005Z Downloading backports.tarfile-1.2.0-py3-none-any.whl.metadata (2.0 kB) 2025-05-07T19:50:15.9775016Z Requirement already satisfied: build in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 14)) (1.2.2.post1) 2025-05-07T19:50:16.0996167Z Collecting cmake (from -r requirements.txt (line 15)) 2025-05-07T19:50:16.1037466Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.3 kB) 2025-05-07T19:50:16.1116264Z Requirement already satisfied: click in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 16)) (8.1.8) 2025-05-07T19:50:16.1117675Z Requirement already satisfied: hypothesis in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 17)) (6.131.14) 2025-05-07T19:50:16.1120105Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 18)) (3.1.6) 2025-05-07T19:50:16.1124154Z Requirement already satisfied: mpmath==1.3.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 19)) (1.3.0) 2025-05-07T19:50:16.1433217Z Collecting ninja (from -r requirements.txt (line 20)) 2025-05-07T19:50:16.1473872Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.0 kB) 2025-05-07T19:50:16.1551484Z Requirement already satisfied: numpy>=2.0.2 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 21)) (2.2.5) 2025-05-07T19:50:16.1710183Z Collecting pyre-extensions (from -r requirements.txt (line 22)) 2025-05-07T19:50:16.1754389Z Downloading pyre_extensions-0.0.32-py3-none-any.whl.metadata (4.0 kB) 2025-05-07T19:50:16.1822672Z Requirement already satisfied: pyyaml in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 23)) (6.0.2) 2025-05-07T19:50:16.1824065Z Requirement already satisfied: scikit-build in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 24)) (0.18.1) 2025-05-07T19:50:16.1831552Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from -r requirements.txt (line 25)) (78.1.1) 2025-05-07T19:50:16.2037029Z Collecting setuptools_git_versioning (from -r requirements.txt (line 26)) 2025-05-07T19:50:16.2072984Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl.metadata (6.1 kB) 2025-05-07T19:50:16.2259473Z Collecting tabulate (from -r requirements.txt (line 27)) 2025-05-07T19:50:16.2294730Z Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB) 2025-05-07T19:50:16.2580136Z Collecting patchelf (from -r requirements.txt (line 28)) 2025-05-07T19:50:16.2617429Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl.metadata (3.5 kB) 2025-05-07T19:50:16.2714748Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from build->-r requirements.txt (line 14)) (25.0) 2025-05-07T19:50:16.2716197Z Requirement already satisfied: pyproject_hooks in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from build->-r requirements.txt (line 14)) (1.2.0) 2025-05-07T19:50:16.2759724Z Requirement already satisfied: attrs>=22.2.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from hypothesis->-r requirements.txt (line 17)) (25.3.0) 2025-05-07T19:50:16.2762344Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from hypothesis->-r requirements.txt (line 17)) (2.4.0) 2025-05-07T19:50:16.2811358Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from jinja2->-r requirements.txt (line 18)) (3.0.2) 2025-05-07T19:50:16.2943758Z Collecting typing-inspect (from pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:50:16.2995229Z Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB) 2025-05-07T19:50:16.3065849Z Requirement already satisfied: typing-extensions in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from pyre-extensions->-r requirements.txt (line 22)) (4.13.2) 2025-05-07T19:50:16.3077027Z Requirement already satisfied: distro in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from scikit-build->-r requirements.txt (line 24)) (1.9.0) 2025-05-07T19:50:16.3088806Z Requirement already satisfied: wheel>=0.32.0 in /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages (from scikit-build->-r requirements.txt (line 24)) (0.45.1) 2025-05-07T19:50:16.3358715Z Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:50:16.3412762Z Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) 2025-05-07T19:50:16.3564381Z Downloading backports.tarfile-1.2.0-py3-none-any.whl (30 kB) 2025-05-07T19:50:16.3685229Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.9 MB) 2025-05-07T19:50:16.4965672Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.9/27.9 MB 223.7 MB/s eta 0:00:00 2025-05-07T19:50:16.5005124Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB) 2025-05-07T19:50:16.5092377Z Downloading pyre_extensions-0.0.32-py3-none-any.whl (12 kB) 2025-05-07T19:50:16.5164568Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl (10 kB) 2025-05-07T19:50:16.5235495Z Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) 2025-05-07T19:50:16.5295689Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl (466 kB) 2025-05-07T19:50:16.5385202Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-05-07T19:50:16.5455798Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-05-07T19:50:16.6889358Z Installing collected packages: tabulate, setuptools_git_versioning, patchelf, ninja, mypy-extensions, cmake, backports.tarfile, typing-inspect, pyre-extensions 2025-05-07T19:50:17.5243416Z 2025-05-07T19:50:17.5270490Z Successfully installed backports.tarfile-1.2.0 cmake-4.0.0 mypy-extensions-1.1.0 ninja-1.11.1.4 patchelf-0.17.2.2 pyre-extensions-0.0.32 setuptools_git_versioning-2.1.0 tabulate-0.9.0 typing-inspect-0.9.0 2025-05-07T19:50:17.5272859Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:50:17.6835201Z ################################################################################ 2025-05-07T19:50:17.6835642Z # Install PyTorch (PyTorch PIP) 2025-05-07T19:50:17.6835959Z # 2025-05-07T19:50:17.6854635Z # [2025-05-07T19:50:17.684Z] + install_triton_pip build_binary 2025-05-07T19:50:17.6855876Z ################################################################################ 2025-05-07T19:50:17.6856525Z 2025-05-07T19:50:17.6856794Z [BUILD] Installing pytorch-triton nightly/3.2.0+git4b3bb1f8 from PIP ... 2025-05-07T19:50:17.6857260Z ################################################################################ 2025-05-07T19:50:17.6857678Z # Install Package From PyTorch PIP: pytorch-triton 2025-05-07T19:50:17.6858028Z # 2025-05-07T19:50:17.6871610Z # [2025-05-07T19:50:17.686Z] + install_from_pytorch_pip build_binary pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:50:17.6873813Z ################################################################################ 2025-05-07T19:50:17.6874491Z 2025-05-07T19:50:17.6887451Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:50:17.7795519Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:50:17.7796075Z ################################################################################ 2025-05-07T19:50:17.7796491Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:50:17.7796882Z # 2025-05-07T19:50:17.7818025Z # [2025-05-07T19:50:17.781Z] + __prepare_pip_arguments pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:50:17.7818875Z ################################################################################ 2025-05-07T19:50:17.7819121Z 2025-05-07T19:50:17.7878504Z [INSTALL] Extracted package (channel, version): (nightly, 3.2.0+git4b3bb1f8) 2025-05-07T19:50:17.7893723Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:50:17.7894365Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:17.7901953Z [INSTALL] Extracted the full PIP package: --pre pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:50:17.7911294Z [INSTALL] Attempting to install [pytorch-triton, 3.2.0+git4b3bb1f8] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/ ... 2025-05-07T19:50:17.7940746Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre pytorch-triton==3.2.0+git4b3bb1f8 --index-url https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:23.3025137Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-05-07T19:50:23.3028828Z torch 2.8.0.dev20250507+cu128 requires pytorch-triton==3.3.0+git96316ce5; platform_system == "Linux", but you have pytorch-triton 3.2.0+git4b3bb1f8 which is incompatible. 2025-05-07T19:50:23.3032493Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:50:23.3034011Z 2025-05-07T19:50:23.3034218Z Looking in indexes: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:23.3034672Z Collecting pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:50:23.3035496Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.3 kB) 2025-05-07T19:50:23.3036794Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (166.5 MB) 2025-05-07T19:50:23.3038010Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.5/166.5 MB 180.2 MB/s eta 0:00:00 2025-05-07T19:50:23.3038437Z Installing collected packages: pytorch-triton 2025-05-07T19:50:23.3038819Z Attempting uninstall: pytorch-triton 2025-05-07T19:50:23.3039208Z Found existing installation: pytorch-triton 3.3.0+git96316ce5 2025-05-07T19:50:23.3039663Z Uninstalling pytorch-triton-3.3.0+git96316ce5: 2025-05-07T19:50:23.3040086Z Successfully uninstalled pytorch-triton-3.3.0+git96316ce5 2025-05-07T19:50:23.3040555Z Successfully installed pytorch-triton-3.2.0+git4b3bb1f8 2025-05-07T19:50:23.3040819Z 2025-05-07T19:50:25.4572671Z [CHECK] Python (sub-)package 'triton' found ... 2025-05-07T19:50:25.4573838Z [CHECK] Printing out the pytorch-triton version ... 2025-05-07T19:50:27.5187794Z ################################################################################ 2025-05-07T19:50:27.5188414Z [CHECK] The installed VERSION of pytorch-triton is: 3.2.0 2025-05-07T19:50:27.5188938Z ################################################################################ 2025-05-07T19:50:27.5189622Z 2025-05-07T19:50:29.5153648Z [CHECK] Python (sub-)package 'numpy' found ... 2025-05-07T19:50:31.5634323Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:50:31.5635528Z [BUILD] Successfully ran git submodules update 2025-05-07T19:50:31.5711581Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:50:31.5712264Z . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:50:31.5712838Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:50:31.5713154Z env: 2025-05-07T19:50:31.5713359Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:50:31.5713664Z BUILD_ENV: build_binary 2025-05-07T19:50:31.5713896Z BUILD_TARGET: genai 2025-05-07T19:50:31.5714121Z BUILD_VARIANT: cuda 2025-05-07T19:50:31.5714355Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:50:31.5714588Z ##[endgroup] 2025-05-07T19:50:32.0052339Z [BUILD] BUILD_TARGET_VARIANT: genai/cuda 2025-05-07T19:50:32.0053443Z [BUILD] Extracted build target: genai 2025-05-07T19:50:32.0054373Z [BUILD] Extracted build variant: cuda 2025-05-07T19:50:33.8463054Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:50:33.8463399Z 2025-05-07T19:50:33.9156575Z [CHECK] Binary cc found in PATH 2025-05-07T19:50:35.7401318Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:50:35.7401695Z 2025-05-07T19:50:35.8178515Z [CHECK] Binary gcc found in PATH 2025-05-07T19:50:37.6655346Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:50:37.6655688Z 2025-05-07T19:50:37.7386865Z [CHECK] Binary c++ found in PATH 2025-05-07T19:50:39.5801687Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:50:39.5802495Z 2025-05-07T19:50:39.6405915Z [CHECK] Binary g++ found in PATH 2025-05-07T19:50:41.5424203Z [BUILD] Extracted and set Python tag: py311 2025-05-07T19:50:41.5424764Z [BUILD] Extracted and set Python platform name: manylinux_2_28_x86_64 2025-05-07T19:50:41.5662649Z core = 24 2025-05-07T19:50:41.5875895Z sockets = 2 2025-05-07T19:50:41.5876798Z [BUILD] Set multicore run option for setup.py: -j 48 2025-05-07T19:50:41.5877522Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:50:41.5877799Z [BUILD] Running pre-build cleanups ... 2025-05-07T19:50:41.5878137Z + rm -rf dist 2025-05-07T19:50:41.5878277Z 2025-05-07T19:50:41.5889968Z 2025-05-07T19:50:41.5890881Z + conda run --no-capture-output -n build_binary python setup.py clean 2025-05-07T19:50:41.5891749Z 2025-05-07T19:50:44.6750416Z INFO:root:running clean 2025-05-07T19:50:44.6750888Z [SETUP.PY] ARGV: ['setup.py', 'clean'] 2025-05-07T19:50:44.6752110Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:44.6753248Z [SETUP.PY] Other arguments: ['clean'] 2025-05-07T19:50:44.6753754Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:44.6754448Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:44.6755077Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:44.6755734Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:44.6756193Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:44.6757481Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:45.0165245Z 2025-05-07T19:50:45.0166196Z [BUILD] Printing git status ... 2025-05-07T19:50:45.0166631Z + git status 2025-05-07T19:50:45.0166779Z 2025-05-07T19:50:45.7447030Z HEAD detached at pull/4066/merge 2025-05-07T19:50:45.7447983Z Untracked files: 2025-05-07T19:50:45.7448869Z (use "git add ..." to include in what will be committed) 2025-05-07T19:50:45.7449931Z ../build_only/ 2025-05-07T19:50:45.7450177Z ../collect_env.py 2025-05-07T19:50:45.7450592Z fbgemm_gpu/docs/version.py 2025-05-07T19:50:45.7450784Z 2025-05-07T19:50:45.7451307Z nothing added to commit but untracked files present (use "git add" to track) 2025-05-07T19:50:45.7451663Z 2025-05-07T19:50:45.7451870Z + git diff 2025-05-07T19:50:45.7451991Z 2025-05-07T19:50:45.7726973Z 2025-05-07T19:50:45.7727637Z ################################################################################ 2025-05-07T19:50:45.7728703Z # Configure FBGEMM-GPU Build 2025-05-07T19:50:45.7729485Z # 2025-05-07T19:50:45.7747775Z # [2025-05-07T19:50:45.774Z] + __configure_fbgemm_gpu_build 2025-05-07T19:50:45.7748225Z ################################################################################ 2025-05-07T19:50:45.7748508Z 2025-05-07T19:50:45.7759062Z [BUILD] Setting the build target: genai ... 2025-05-07T19:50:45.7759545Z [BUILD] Configuring build as CUDA variant (this is the default behavior) ... 2025-05-07T19:50:47.6661078Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:50:47.6661971Z 2025-05-07T19:50:47.7470226Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:50:49.6011850Z /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:49.6012700Z 2025-05-07T19:50:49.6781159Z [CHECK] Environment variable CUDNN_INCLUDE_DIR is defined in the Conda environment 2025-05-07T19:50:51.5303723Z /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:51.5304512Z 2025-05-07T19:50:51.6070496Z [CHECK] Environment variable CUDNN_LIBRARY is defined in the Conda environment 2025-05-07T19:50:53.4846882Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:53.4847362Z 2025-05-07T19:50:53.5613639Z [CHECK] Environment variable NVML_LIB_PATH is defined in the Conda environment 2025-05-07T19:50:55.4636725Z [BUILD] Using the default architectures for CUDA nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:50:55.4637376Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:50:55.4637838Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:50:55.4638196Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:50:55.4638631Z Build cuda_12.8.r12.8/compiler.35404655_0 ... 2025-05-07T19:50:55.4639085Z [BUILD] Setting the following CUDA targets: 7.0;8.0;9.0;9.0a;10.0a;12.0a 2025-05-07T19:50:55.4639574Z [BUILD] Looking up NVML filepath ... 2025-05-07T19:50:57.3907712Z [BUILD] Looking up NCCL filepath ... 2025-05-07T19:51:01.2996290Z [BUILD] Setting NVCC verbose mode ... 2025-05-07T19:51:01.2997558Z + conda env config vars set -n build_binary NVCC_VERBOSE=1 2025-05-07T19:51:01.2998420Z 2025-05-07T19:51:01.7266149Z 2025-05-07T19:51:01.7267454Z [BUILD] Setting CUDA build args ... 2025-05-07T19:51:03.6303109Z [BUILD] Looking up CUDA version ... 2025-05-07T19:51:07.4032965Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:51:07.4033826Z 2025-05-07T19:51:09.3003148Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:51:09.3005888Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:51:09.3007283Z 2025-05-07T19:51:09.3007609Z [BUILD] Setting NVCC flags ... 2025-05-07T19:51:09.3010363Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-std=c++20 -Xcompiler -std=c++20 -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler" 2025-05-07T19:51:09.3011224Z 2025-05-07T19:51:09.7117650Z 2025-05-07T19:51:09.7118543Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:51:09.7119412Z 2025-05-07T19:51:11.5400079Z -std=c++20 -Xcompiler -std=c++20 -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler 2025-05-07T19:51:11.5400790Z 2025-05-07T19:51:11.6146117Z 2025-05-07T19:51:11.6146975Z [BUILD] Setting CUDA build args ... 2025-05-07T19:51:11.6148784Z + conda run -n build_binary c++ --version 2025-05-07T19:51:11.6149480Z 2025-05-07T19:51:13.4708857Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:51:13.4711379Z Target: x86_64-conda-linux-gnu 2025-05-07T19:51:13.4711684Z Thread model: posix 2025-05-07T19:51:13.4712049Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:51:13.4712694Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:51:13.4713198Z 2025-05-07T19:51:13.5285742Z 2025-05-07T19:51:13.5287119Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:51:13.5287971Z 2025-05-07T19:51:15.4457353Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:51:15.4458303Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:51:15.4461122Z 2025-05-07T19:51:15.4461685Z [BUILD] Clang is available; configuring for Clang-based build ... 2025-05-07T19:51:17.3509828Z .github/scripts/fbgemm_gpu_build.bash: line 370: [: : integer expression expected 2025-05-07T19:51:17.3510506Z [BUILD] Enabling debug features in the build ... 2025-05-07T19:51:17.3513196Z [BUILD] FBGEMM_GPU build arguments have been set: --verbose --build-target=genai --build-variant=cuda --nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 -DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 --cxxprefix=/github/home/miniconda/envs/build_binary --debug 2025-05-07T19:51:17.3516051Z ################################################################################ 2025-05-07T19:51:17.3516435Z # Build FBGEMM-GPU Package (Wheel) 2025-05-07T19:51:17.3516769Z # 2025-05-07T19:51:17.3526940Z # [2025-05-07T19:51:17.352Z] + build_fbgemm_gpu_package build_binary nightly genai/cuda 2025-05-07T19:51:17.3527483Z ################################################################################ 2025-05-07T19:51:17.3527720Z 2025-05-07T19:51:17.3531306Z [BUILD] Building FBGEMM wheel (TARGET=genai, VARIANT=cuda) ... 2025-05-07T19:51:17.3536439Z + conda run --no-capture-output -n build_binary python -m build --wheel --no-isolation --config-setting=--build-option=--verbose --config-setting=--build-option=--build-target=genai --config-setting=--build-option=--build-variant=cuda --config-setting=--build-option=--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --config-setting=--build-option=--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 --config-setting=--build-option=-DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' --config-setting=--build-option=-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCMAKE_CXX_STANDARD=20 --config-setting=--build-option=--cxxprefix=/github/home/miniconda/envs/build_binary --config-setting=--build-option=--debug --config-setting=--build-option=--package_channel=nightly --config-setting=--build-option=--python-tag=py311 --config-setting=--build-option=--plat-name=manylinux_2_28_x86_64 2025-05-07T19:51:17.3541523Z 2025-05-07T19:51:19.2299012Z * Getting build dependencies for wheel... 2025-05-07T19:51:20.4716139Z INFO:root:running egg_info 2025-05-07T19:51:20.4739884Z INFO:root:creating fbgemm_gpu_nightly.egg-info 2025-05-07T19:51:20.4743792Z INFO:root:writing fbgemm_gpu_nightly.egg-info/PKG-INFO 2025-05-07T19:51:20.4744461Z INFO:root:writing dependency_links to fbgemm_gpu_nightly.egg-info/dependency_links.txt 2025-05-07T19:51:20.4745404Z INFO:root:writing requirements to fbgemm_gpu_nightly.egg-info/requires.txt 2025-05-07T19:51:20.4746039Z INFO:root:writing top-level names to fbgemm_gpu_nightly.egg-info/top_level.txt 2025-05-07T19:51:20.4746788Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:20.4798624Z INFO:root:reading manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:20.4807711Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:20.4810295Z [SETUP.PY] ARGV: ['setup.py', 'egg_info'] 2025-05-07T19:51:20.4811409Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:51:20.4812528Z [SETUP.PY] Other arguments: ['egg_info'] 2025-05-07T19:51:20.4813030Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:51:20.4813653Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:51:20.4814259Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:51:20.4814861Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:51:20.4815287Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:51:20.4816588Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:51:20.7834893Z * Building wheel... 2025-05-07T19:51:22.0251451Z [SETUP.PY] ARGV: ['setup.py', 'bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-sjma5294', '--verbose', '--build-target=genai', '--build-variant=cuda', '--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--cxxprefix=/github/home/miniconda/envs/build_binary', '--debug', '--package_channel=nightly', '--python-tag=py311', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:51:22.0256349Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=True, debug=True, dryrun=False, build_target='genai', build_variant='cuda', package_channel='nightly', nvml_lib_path='/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', nccl_lib_path='/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2', use_fb_only=False, cxxprefix='/github/home/miniconda/envs/build_binary') 2025-05-07T19:51:22.0259821Z [SETUP.PY] Other arguments: ['bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-sjma5294', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--python-tag=py311', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:51:22.0261705Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:51:22.0262257Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:51:22.0262842Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:51:22.0263374Z [SETUP.PY] Setting the FBGEMM build target: genai ... 2025-05-07T19:51:22.0263796Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:51:22.0270448Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DCMAKE_VERBOSE_MAKEFILE=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE', '-DFBGEMM_BUILD_TARGET=genai', '-DFBGEMM_BUILD_VARIANT=cuda', '-DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '-DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include', '-DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DCMAKE_C_COMPILER=/github/home/miniconda/envs/build_binary/bin/cc', '-DCMAKE_CXX_COMPILER=/github/home/miniconda/envs/build_binary/bin/c++', "-DCMAKE_C_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'", "-DCMAKE_CXX_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'", '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20'] 2025-05-07T19:51:22.0277008Z 2025-05-07T19:51:22.0277013Z 2025-05-07T19:51:22.0277198Z -------------------------------------------------------------------------------- 2025-05-07T19:51:22.0277624Z -- Trying 'Ninja' generator 2025-05-07T19:51:22.0277933Z -------------------------------- 2025-05-07T19:51:22.0278244Z --------------------------- 2025-05-07T19:51:22.0278502Z ---------------------- 2025-05-07T19:51:22.0278776Z ----------------- 2025-05-07T19:51:22.0279038Z ------------ 2025-05-07T19:51:22.0279256Z ------- 2025-05-07T19:51:22.0279489Z -- 2025-05-07T19:51:22.0736232Z CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): 2025-05-07T19:51:22.0737829Z Not searching for unused variables given on the command line. 2025-05-07T19:51:22.0739383Z Compatibility with CMake < 3.10 will be removed from a future version of 2025-05-07T19:51:22.0740607Z CMake. 2025-05-07T19:51:22.0740934Z 2025-05-07T19:51:22.0741611Z Update the VERSION argument value. Or, use the ... syntax 2025-05-07T19:51:22.0742454Z to tell CMake that the project requires at least but has been updated 2025-05-07T19:51:22.0742938Z to work with policies introduced by or earlier. 2025-05-07T19:51:22.0743182Z 2025-05-07T19:51:22.0743186Z 2025-05-07T19:51:22.1609492Z -- The C compiler identification is Clang 16.0.6 2025-05-07T19:51:22.1697741Z -- Detecting C compiler ABI info 2025-05-07T19:51:22.3006979Z -- Detecting C compiler ABI info - done 2025-05-07T19:51:22.3136231Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/cc - skipped 2025-05-07T19:51:22.3137913Z -- Detecting C compile features 2025-05-07T19:51:22.3141328Z -- Detecting C compile features - done 2025-05-07T19:51:22.4643232Z -- The CXX compiler identification is Clang 16.0.6 2025-05-07T19:51:22.4714017Z -- Detecting CXX compiler ABI info 2025-05-07T19:51:22.6083467Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:51:22.6216142Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/c++ - skipped 2025-05-07T19:51:22.6217505Z -- Detecting CXX compile features 2025-05-07T19:51:22.6227033Z -- Detecting CXX compile features - done 2025-05-07T19:51:22.6242960Z -- Configuring done (0.6s) 2025-05-07T19:51:22.6292548Z -- Generating done (0.0s) 2025-05-07T19:51:22.6315641Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_cmake_test_compile/build 2025-05-07T19:51:22.6363768Z -- 2025-05-07T19:51:22.6364063Z ------- 2025-05-07T19:51:22.6364303Z ------------ 2025-05-07T19:51:22.6364555Z ----------------- 2025-05-07T19:51:22.6365109Z ---------------------- 2025-05-07T19:51:22.6365372Z --------------------------- 2025-05-07T19:51:22.6365678Z -------------------------------- 2025-05-07T19:51:22.6365977Z -- Trying 'Ninja' generator - success 2025-05-07T19:51:22.6366508Z -------------------------------------------------------------------------------- 2025-05-07T19:51:22.6366817Z 2025-05-07T19:51:22.6375905Z Configuring Project 2025-05-07T19:51:22.6376216Z Working directory: 2025-05-07T19:51:22.6376596Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build 2025-05-07T19:51:22.6377039Z Command: 2025-05-07T19:51:22.6397416Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/cmake/data/bin/cmake /__w/FBGEMM/FBGEMM/fbgemm_gpu -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install -DPYTHON_VERSION_STRING:STRING=3.11.11 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPYTHON_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.11 -DPYTHON_LIBRARY:PATH=/github/home/miniconda/envs/build_binary/lib/libpython3.11.so -DPython_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.11 -DPython_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/numpy/_core/include -DPython3_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython3_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.11 -DPython3_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/numpy/_core/include -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch -D_GLIBCXX_USE_CXX11_ABI=1 -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DFBGEMM_BUILD_TARGET=genai -DFBGEMM_BUILD_VARIANT=cuda -DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 -DCMAKE_C_COMPILER=/github/home/miniconda/envs/build_binary/bin/cc -DCMAKE_CXX_COMPILER=/github/home/miniconda/envs/build_binary/bin/c++ '-DCMAKE_C_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'"'"'' '-DCMAKE_CXX_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'"'"'' '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release 2025-05-07T19:51:22.6417697Z 2025-05-07T19:51:22.6802134Z 2025-05-07T19:51:22.6803154Z Not searching for unused variables given on the command line. 2025-05-07T19:51:22.6804137Z 2025-05-07T19:51:22.6804504Z ================================================================================ 2025-05-07T19:51:22.6805464Z Default C compiler flags 2025-05-07T19:51:22.6806529Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:51:22.6807433Z 2025-05-07T19:51:22.6809984Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include 2025-05-07T19:51:22.6811641Z ================================================================================ 2025-05-07T19:51:22.6811867Z 2025-05-07T19:51:22.6811871Z 2025-05-07T19:51:22.6811880Z 2025-05-07T19:51:22.6812024Z ================================================================================ 2025-05-07T19:51:22.6812351Z Default C++ compiler flags 2025-05-07T19:51:22.6812730Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:51:22.6813019Z 2025-05-07T19:51:22.6813801Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include 2025-05-07T19:51:22.6814809Z ================================================================================ 2025-05-07T19:51:22.6815030Z 2025-05-07T19:51:22.6815058Z 2025-05-07T19:51:22.6815061Z 2025-05-07T19:51:22.6815174Z ================================================================================ 2025-05-07T19:51:22.6815485Z AVX2_FLAGS: 2025-05-07T19:51:22.6815627Z 2025-05-07T19:51:22.6815709Z -mavx2 2025-05-07T19:51:22.6815933Z -mf16c 2025-05-07T19:51:22.6816129Z -mfma 2025-05-07T19:51:22.6816354Z -fopenmp 2025-05-07T19:51:22.6816581Z ================================================================================ 2025-05-07T19:51:22.6816803Z 2025-05-07T19:51:22.6816810Z 2025-05-07T19:51:22.6816838Z 2025-05-07T19:51:22.6816955Z ================================================================================ 2025-05-07T19:51:22.6817263Z AVX512_FLAGS: 2025-05-07T19:51:22.6817419Z 2025-05-07T19:51:22.6817503Z -mavx2 2025-05-07T19:51:22.6817701Z -mf16c 2025-05-07T19:51:22.6817921Z -mfma 2025-05-07T19:51:22.6818145Z -mavx512f 2025-05-07T19:51:22.6818346Z -mavx512bw 2025-05-07T19:51:22.6818572Z -mavx512dq 2025-05-07T19:51:22.6818770Z -mavx512vl 2025-05-07T19:51:22.6819002Z -fopenmp 2025-05-07T19:51:22.6819231Z ================================================================================ 2025-05-07T19:51:22.6819711Z 2025-05-07T19:51:22.6819715Z 2025-05-07T19:51:22.6819718Z 2025-05-07T19:51:22.6819930Z ================================================================================ 2025-05-07T19:51:22.6820275Z The project is built using scikit-build 2025-05-07T19:51:22.6820628Z ================================================================================ 2025-05-07T19:51:22.6820855Z 2025-05-07T19:51:22.6820858Z 2025-05-07T19:51:22.6820862Z 2025-05-07T19:51:22.6821007Z ================================================================================ 2025-05-07T19:51:22.6821321Z Build Settings 2025-05-07T19:51:22.6821474Z 2025-05-07T19:51:22.6821579Z FBGEMM_BUILD_TARGET : genai 2025-05-07T19:51:22.6821860Z FBGEMM_BUILD_VARIANT : cuda 2025-05-07T19:51:22.6822065Z 2025-05-07T19:51:22.6822164Z NVCC_VERBOSE : 2025-05-07T19:51:22.6822444Z CUDNN_INCLUDE_DIR : 2025-05-07T19:51:22.6822694Z CUDNN_LIBRARY : 2025-05-07T19:51:22.6823135Z NVML_LIB_PATH : /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:22.6823595Z TORCH_CUDA_ARCH_LIST : 7.0 2025-05-07T19:51:22.6823869Z 8.0 2025-05-07T19:51:22.6824061Z 9.0 2025-05-07T19:51:22.6824267Z 9.0a 2025-05-07T19:51:22.6824452Z 10.0a 2025-05-07T19:51:22.6824669Z 12.0a 2025-05-07T19:51:22.6824781Z 2025-05-07T19:51:22.6824877Z HIP_ROOT_DIR : 2025-05-07T19:51:22.6825151Z HIPCC_VERBOSE : 2025-05-07T19:51:22.6825431Z AMDGPU_TARGETS : 2025-05-07T19:51:22.6825684Z PYTORCH_ROCM_ARCH : 2025-05-07T19:51:22.6825982Z ================================================================================ 2025-05-07T19:51:22.6826206Z 2025-05-07T19:51:22.8376011Z -- The CXX compiler identification is Clang 16.0.6 2025-05-07T19:51:22.9121512Z -- The C compiler identification is Clang 16.0.6 2025-05-07T19:51:23.9967270Z -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler Clang 16.0.6 2025-05-07T19:51:24.0080097Z -- Detecting CXX compiler ABI info 2025-05-07T19:51:24.1453366Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:51:24.1591169Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/c++ - skipped 2025-05-07T19:51:24.1592774Z -- Detecting CXX compile features 2025-05-07T19:51:24.1598690Z -- Detecting CXX compile features - done 2025-05-07T19:51:24.1671561Z -- Detecting C compiler ABI info 2025-05-07T19:51:24.2973101Z -- Detecting C compiler ABI info - done 2025-05-07T19:51:24.3104507Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/cc - skipped 2025-05-07T19:51:24.3106071Z -- Detecting C compile features 2025-05-07T19:51:24.3108554Z -- Detecting C compile features - done 2025-05-07T19:51:24.3159497Z -- Detecting CUDA compiler ABI info 2025-05-07T19:51:25.3506116Z -- Detecting CUDA compiler ABI info - done 2025-05-07T19:51:25.4044610Z -- Check for working CUDA compiler: /github/home/miniconda/envs/build_binary/bin/nvcc - skipped 2025-05-07T19:51:25.4067705Z -- Detecting CUDA compile features 2025-05-07T19:51:25.4069576Z -- Detecting CUDA compile features - done 2025-05-07T19:51:25.4091898Z -- Performing Test C_HAS_AVX_1 2025-05-07T19:51:25.7021014Z -- Performing Test C_HAS_AVX_1 - Failed 2025-05-07T19:51:25.7022014Z -- Performing Test C_HAS_AVX_2 2025-05-07T19:51:26.0392672Z -- Performing Test C_HAS_AVX_2 - Success 2025-05-07T19:51:26.0393208Z -- Performing Test C_HAS_AVX2_1 2025-05-07T19:51:26.3296518Z -- Performing Test C_HAS_AVX2_1 - Failed 2025-05-07T19:51:26.3299138Z -- Performing Test C_HAS_AVX2_2 2025-05-07T19:51:26.6667757Z -- Performing Test C_HAS_AVX2_2 - Success 2025-05-07T19:51:26.6668172Z -- Performing Test C_HAS_AVX512_1 2025-05-07T19:51:26.9559406Z -- Performing Test C_HAS_AVX512_1 - Failed 2025-05-07T19:51:26.9560053Z -- Performing Test C_HAS_AVX512_2 2025-05-07T19:51:27.2943186Z -- Performing Test C_HAS_AVX512_2 - Success 2025-05-07T19:51:27.2944246Z -- Performing Test CXX_HAS_AVX_1 2025-05-07T19:51:27.5825142Z -- Performing Test CXX_HAS_AVX_1 - Failed 2025-05-07T19:51:27.5826605Z -- Performing Test CXX_HAS_AVX_2 2025-05-07T19:51:27.9189836Z -- Performing Test CXX_HAS_AVX_2 - Success 2025-05-07T19:51:27.9191672Z -- Performing Test CXX_HAS_AVX2_1 2025-05-07T19:51:28.2101889Z -- Performing Test CXX_HAS_AVX2_1 - Failed 2025-05-07T19:51:28.2102460Z -- Performing Test CXX_HAS_AVX2_2 2025-05-07T19:51:28.5466026Z -- Performing Test CXX_HAS_AVX2_2 - Success 2025-05-07T19:51:28.5467441Z -- Performing Test CXX_HAS_AVX512_1 2025-05-07T19:51:28.8365743Z -- Performing Test CXX_HAS_AVX512_1 - Failed 2025-05-07T19:51:28.8366801Z -- Performing Test CXX_HAS_AVX512_2 2025-05-07T19:51:29.1749028Z -- Performing Test CXX_HAS_AVX512_2 - Success 2025-05-07T19:51:29.1927270Z -- Found CUDA: /github/home/miniconda/envs/build_binary/targets/x86_64-linux (found version "12.8") 2025-05-07T19:51:29.1965417Z -- Found CUDAToolkit: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include (found version "12.8.61") 2025-05-07T19:51:29.2044926Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-05-07T19:51:29.3361706Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success 2025-05-07T19:51:29.3369439Z -- Found Threads: TRUE 2025-05-07T19:51:29.5023180Z -- PyTorch: CUDA detected: 12.8 2025-05-07T19:51:29.5023886Z -- PyTorch: CUDA nvcc is: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/bin/nvcc 2025-05-07T19:51:29.5024659Z -- PyTorch: CUDA toolkit directory: /github/home/miniconda/envs/build_binary/targets/x86_64-linux 2025-05-07T19:51:29.6633986Z -- PyTorch: Header version is: 12.8 2025-05-07T19:51:29.7444566Z -- Found Python: /github/home/miniconda/envs/build_binary/bin/python (found version "3.11.11") found components: Interpreter 2025-05-07T19:51:29.7460153Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-05-07T19:51:29.7462667Z Failed to compute shorthash for libnvrtc.so 2025-05-07T19:51:29.7463005Z Call Stack (most recent call first): 2025-05-07T19:51:29.7463735Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-05-07T19:51:29.7464867Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-05-07T19:51:29.7465863Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:51:29.7466313Z CMakeLists.txt:112 (include) 2025-05-07T19:51:29.7466492Z 2025-05-07T19:51:29.7466497Z 2025-05-07T19:51:29.7466654Z -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-05-07T19:51:29.7467524Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-05-07T19:51:29.7468001Z -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-05-07T19:51:29.7468438Z -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-05-07T19:51:29.7469578Z -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a;-gencode;arch=compute_100a,code=sm_100a;-gencode;arch=compute_120a,code=sm_120a 2025-05-07T19:51:29.7816356Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-05-07T19:51:29.7818807Z static library kineto_LIBRARY-NOTFOUND not found. 2025-05-07T19:51:29.7819852Z Call Stack (most recent call first): 2025-05-07T19:51:29.7821284Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-05-07T19:51:29.7822290Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:51:29.7822754Z CMakeLists.txt:112 (include) 2025-05-07T19:51:29.7822940Z 2025-05-07T19:51:29.7822945Z 2025-05-07T19:51:29.7822969Z 2025-05-07T19:51:29.7822973Z 2025-05-07T19:51:29.7823105Z ================================================================================ 2025-05-07T19:51:29.7823700Z PyTorch Flags: 2025-05-07T19:51:29.7823955Z 2025-05-07T19:51:29.7824175Z TORCH_INCLUDE_DIRS: 2025-05-07T19:51:29.7824748Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include 2025-05-07T19:51:29.7825557Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:29.7826190Z 2025-05-07T19:51:29.7826404Z TORCH_LIBRARIES: 2025-05-07T19:51:29.7826663Z torch 2025-05-07T19:51:29.7826877Z torch_library 2025-05-07T19:51:29.7827351Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so 2025-05-07T19:51:29.7828040Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:29.7828767Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:29.7829329Z 2025-05-07T19:51:29.7829543Z TORCH_CUDA_OPTIONS: 2025-05-07T19:51:29.7830099Z -- Found Torch: /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so 2025-05-07T19:51:29.7830725Z --expt-relaxed-constexpr 2025-05-07T19:51:29.7831043Z -D__CUDA_NO_HALF_OPERATORS__ 2025-05-07T19:51:29.7831342Z -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 2025-05-07T19:51:29.7831680Z -D__CUDA_NO_HALF2_OPERATORS__ 2025-05-07T19:51:29.7832005Z ================================================================================ 2025-05-07T19:51:29.7832245Z 2025-05-07T19:51:29.7832249Z 2025-05-07T19:51:29.7832253Z 2025-05-07T19:51:29.7832375Z ================================================================================ 2025-05-07T19:51:29.7832727Z NCCL Flags 2025-05-07T19:51:29.7832853Z 2025-05-07T19:51:29.7833241Z NCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include 2025-05-07T19:51:29.7834246Z NCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:29.7834867Z ================================================================================ 2025-05-07T19:51:29.7835089Z 2025-05-07T19:51:29.7835093Z 2025-05-07T19:51:29.7835096Z 2025-05-07T19:51:29.7835212Z ================================================================================ 2025-05-07T19:51:29.7835549Z CUDA Driver Path 2025-05-07T19:51:29.7835683Z 2025-05-07T19:51:29.7836022Z CUDA_DRIVER_LIBRARIES=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:29.7836597Z ================================================================================ 2025-05-07T19:51:29.7836815Z 2025-05-07T19:51:29.7837118Z -- Found NVML_LIB_PATH: /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:29.7854277Z 2025-05-07T19:51:29.7854370Z 2025-05-07T19:51:29.7854892Z ================================================================================ 2025-05-07T19:51:29.7856071Z GPU CPP Library Target: asmjit (SHARED) 2025-05-07T19:51:29.7856947Z 2025-05-07T19:51:29.7857491Z CPU_SRCS: 2025-05-07T19:51:29.7857856Z 2025-05-07T19:51:29.7858071Z 2025-05-07T19:51:29.7858562Z GPU_SRCS: 2025-05-07T19:51:29.7858909Z 2025-05-07T19:51:29.7859119Z 2025-05-07T19:51:29.7859648Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:29.7860092Z 2025-05-07T19:51:29.7860301Z 2025-05-07T19:51:29.7860849Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:29.7861250Z 2025-05-07T19:51:29.7861462Z 2025-05-07T19:51:29.7861984Z OTHER_SRCS: 2025-05-07T19:51:29.7862575Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:51:29.7863235Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:51:29.7863850Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:51:29.7864509Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:51:29.7865173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:51:29.7865963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:51:29.7866599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:51:29.7867542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:51:29.7868223Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:51:29.7868830Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:51:29.7869472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:51:29.7870117Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:51:29.7870739Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:51:29.7871369Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:51:29.7871971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:51:29.7872612Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:51:29.7873216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:51:29.7873856Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:51:29.7874477Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:51:29.7875075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:51:29.7875702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:51:29.7876310Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:51:29.7876981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:51:29.7877631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:51:29.7878256Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:51:29.7878877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:51:29.7879502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:51:29.7880161Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:51:29.7880832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:51:29.7881442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:51:29.7882055Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:51:29.7882822Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:51:29.7883460Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:51:29.7884056Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:51:29.7884686Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:51:29.7885273Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:51:29.7885898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:51:29.7886491Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:51:29.7887115Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:51:29.7887732Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:51:29.7888315Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:51:29.7888917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:51:29.7889488Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:51:29.7890085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:51:29.7890783Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:51:29.7891488Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:51:29.7892112Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:51:29.7892714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:51:29.7893347Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:51:29.7893964Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:51:29.7894600Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:51:29.7895312Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:51:29.7895884Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:51:29.7896498Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:51:29.7897052Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:51:29.7897631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:51:29.7898184Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:51:29.7898771Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:51:29.7899352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:51:29.7899767Z 2025-05-07T19:51:29.7899996Z CC_FLAGS: 2025-05-07T19:51:29.7900115Z 2025-05-07T19:51:29.7900203Z 2025-05-07T19:51:29.7900424Z NVCC_FLAGS: 2025-05-07T19:51:29.7900543Z 2025-05-07T19:51:29.7900626Z 2025-05-07T19:51:29.7900845Z HIPCC_FLAGS: 2025-05-07T19:51:29.7900974Z 2025-05-07T19:51:29.7901058Z 2025-05-07T19:51:29.7901269Z INCLUDE_DIRS: 2025-05-07T19:51:29.7901511Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:29.7901849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:29.7902156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:29.7902462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:29.7902964Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include 2025-05-07T19:51:29.7903713Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:29.7904355Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:29.7904759Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:29.7905205Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:29.7905689Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:29.7906190Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:29.7906657Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:29.7907192Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include 2025-05-07T19:51:29.7907697Z 2025-05-07T19:51:29.7907897Z Selected Source Files: 2025-05-07T19:51:29.7908078Z 2025-05-07T19:51:29.7908166Z 2025-05-07T19:51:29.7908363Z HIPified Source Files: 2025-05-07T19:51:29.7908541Z 2025-05-07T19:51:29.7908622Z 2025-05-07T19:51:29.7908852Z Library Dependencies: 2025-05-07T19:51:29.7909080Z torch 2025-05-07T19:51:29.7909307Z torch_library 2025-05-07T19:51:29.7909729Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so 2025-05-07T19:51:29.7910401Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:29.7911066Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:29.7911842Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:29.7912677Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:29.7913248Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:29.7913667Z 2025-05-07T19:51:29.7913935Z Output Library: 2025-05-07T19:51:29.7914179Z asmjit 2025-05-07T19:51:29.7914365Z 2025-05-07T19:51:29.7914585Z Destination Directory: 2025-05-07T19:51:29.7914820Z fbgemm_gpu 2025-05-07T19:51:29.7915075Z ================================================================================ 2025-05-07T19:51:29.7915300Z 2025-05-07T19:51:29.7915304Z 2025-05-07T19:51:29.7915308Z 2025-05-07T19:51:29.7915450Z ================================================================================ 2025-05-07T19:51:29.7915782Z GPU CPP Library Target: fbgemm (SHARED) 2025-05-07T19:51:29.7916093Z 2025-05-07T19:51:29.7916281Z CPU_SRCS: 2025-05-07T19:51:29.7916418Z 2025-05-07T19:51:29.7916500Z 2025-05-07T19:51:29.7916688Z GPU_SRCS: 2025-05-07T19:51:29.7916824Z 2025-05-07T19:51:29.7916909Z 2025-05-07T19:51:29.7917103Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:29.7917267Z 2025-05-07T19:51:29.7917353Z 2025-05-07T19:51:29.7917584Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:29.7917721Z 2025-05-07T19:51:29.7917809Z 2025-05-07T19:51:29.7918033Z OTHER_SRCS: 2025-05-07T19:51:29.7918305Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDM.cc 2025-05-07T19:51:29.7918767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:29.7919215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMNBit.cc 2025-05-07T19:51:29.7919631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtils.cc 2025-05-07T19:51:29.7920029Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RefImplementations.cc 2025-05-07T19:51:29.7920515Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:29.7920974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/SparseAdagrad.cc 2025-05-07T19:51:29.7921343Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/Utils.cc 2025-05-07T19:51:29.7921746Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:29.7922160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:51:29.7922683Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:29.7923303Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:51:29.7923813Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:51:29.7924200Z 2025-05-07T19:51:29.7924432Z CC_FLAGS: 2025-05-07T19:51:29.7924563Z 2025-05-07T19:51:29.7924682Z 2025-05-07T19:51:29.7924890Z NVCC_FLAGS: 2025-05-07T19:51:29.7925021Z 2025-05-07T19:51:29.7925137Z 2025-05-07T19:51:29.7925347Z HIPCC_FLAGS: 2025-05-07T19:51:29.7925512Z 2025-05-07T19:51:29.7925606Z 2025-05-07T19:51:29.7925817Z INCLUDE_DIRS: 2025-05-07T19:51:29.7926109Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:29.7926446Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:29.7926787Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:29.7927122Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:29.7927661Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include 2025-05-07T19:51:29.7928491Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:29.7929153Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:29.7929608Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:29.7930050Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:29.7930556Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:29.7931090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:29.7931592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:29.7932187Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include 2025-05-07T19:51:29.7932700Z 2025-05-07T19:51:29.7932931Z Selected Source Files: 2025-05-07T19:51:29.7933090Z 2025-05-07T19:51:29.7933269Z 2025-05-07T19:51:29.7933513Z HIPified Source Files: 2025-05-07T19:51:29.7933674Z 2025-05-07T19:51:29.7933760Z 2025-05-07T19:51:29.7934006Z Library Dependencies: 2025-05-07T19:51:29.7934248Z torch 2025-05-07T19:51:29.7934552Z torch_library 2025-05-07T19:51:29.7935027Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so 2025-05-07T19:51:29.7935800Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:29.7936480Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:29.7937232Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:29.7937950Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:29.7938396Z asmjit 2025-05-07T19:51:29.7938747Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:29.7939169Z 2025-05-07T19:51:29.7939376Z Output Library: 2025-05-07T19:51:29.7939624Z fbgemm 2025-05-07T19:51:29.7939811Z 2025-05-07T19:51:29.7940032Z Destination Directory: 2025-05-07T19:51:29.7940266Z fbgemm_gpu 2025-05-07T19:51:29.7940531Z ================================================================================ 2025-05-07T19:51:29.7940760Z 2025-05-07T19:51:29.7940764Z 2025-05-07T19:51:29.7940768Z 2025-05-07T19:51:29.7940886Z ================================================================================ 2025-05-07T19:51:29.7941239Z Running code generation script ... 2025-05-07T19:51:29.7941980Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py --opensource 2025-05-07T19:51:29.7942710Z ================================================================================ 2025-05-07T19:51:29.7942957Z 2025-05-07T19:51:30.3277219Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:30.3279218Z [GENERAATE BACKWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py', '--opensource'] 2025-05-07T19:51:30.3279977Z Written: gen_embedding_backward_dense_split_weighted_vbe_cuda.cu 2025-05-07T19:51:30.3280506Z Written: gen_embedding_backward_dense_split_weighted_cuda.cu 2025-05-07T19:51:30.3281058Z Written: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.3281591Z Written: gen_embedding_backward_dense_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:30.3282124Z Written: gen_embedding_backward_dense_split_unweighted_cuda.cu 2025-05-07T19:51:30.3282765Z Written: gen_embedding_backward_dense_split_weighted_vbe_meta.cpp 2025-05-07T19:51:30.3283299Z Written: gen_embedding_backward_dense_split_weighted_meta.cpp 2025-05-07T19:51:30.3283839Z Written: gen_embedding_backward_dense_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.3284415Z Written: gen_embedding_backward_dense_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:30.3284965Z Written: gen_embedding_backward_dense_split_unweighted_meta.cpp 2025-05-07T19:51:30.3285496Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.3286065Z Written: gen_embedding_backward_dense_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.3286620Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.3287224Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.3287777Z Written: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.3288353Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.3288924Z Written: gen_embedding_backward_dense_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.3289590Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.3290159Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.3290688Z Written: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.3291456Z Written: gen_embedding_optimizer_dense_split_device_kernel.cuh 2025-05-07T19:51:30.3291876Z Written: gen_embedding_backward_split_dense.cpp 2025-05-07T19:51:30.3292383Z Written: gen_embedding_backward_dense_split_cpu.cpp 2025-05-07T19:51:30.3292837Z Written: gen_embedding_backward_adagrad_split_weighted_cuda.cu 2025-05-07T19:51:30.3293326Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.3293854Z Written: gen_embedding_backward_adagrad_split_unweighted_cuda.cu 2025-05-07T19:51:30.3294328Z Written: gen_embedding_backward_adagrad_split_weighted_meta.cpp 2025-05-07T19:51:30.3294853Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.3295359Z Written: gen_embedding_backward_adagrad_split_unweighted_meta.cpp 2025-05-07T19:51:30.3295883Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.3296434Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.3296978Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.3297506Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.3298041Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.3298611Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.3299104Z Written: gen_embedding_optimizer_adagrad_split_device_kernel.cuh 2025-05-07T19:51:30.3299554Z Written: gen_embedding_backward_split_adagrad.cpp 2025-05-07T19:51:30.3299974Z Written: gen_embedding_split_adagrad_pt2_autograd.cpp 2025-05-07T19:51:30.3300414Z Written: gen_embedding_backward_split_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.3300845Z Written: lookup_adagrad.py 2025-05-07T19:51:30.3301190Z Written: gen_embedding_backward_adagrad_split_cpu.cpp 2025-05-07T19:51:30.3301594Z Written: gen_embedding_backward_split_adagrad_cpu.cpp 2025-05-07T19:51:30.3302071Z Written: gen_embedding_backward_split_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.3302551Z Written: gen_embedding_backward_adam_split_weighted_vbe_cuda.cu 2025-05-07T19:51:30.3303042Z Written: gen_embedding_backward_adam_split_weighted_cuda.cu 2025-05-07T19:51:30.3303514Z Written: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.3304039Z Written: gen_embedding_backward_adam_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:30.3304539Z Written: gen_embedding_backward_adam_split_unweighted_cuda.cu 2025-05-07T19:51:30.3305005Z Written: gen_embedding_backward_adam_split_weighted_vbe_meta.cpp 2025-05-07T19:51:30.3305495Z Written: gen_embedding_backward_adam_split_weighted_meta.cpp 2025-05-07T19:51:30.3305976Z Written: gen_embedding_backward_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.3306509Z Written: gen_embedding_backward_adam_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:30.3306993Z Written: gen_embedding_backward_adam_split_unweighted_meta.cpp 2025-05-07T19:51:30.3307522Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.3308049Z Written: gen_embedding_backward_adam_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.3308568Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.3309139Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.3309659Z Written: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.3310199Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.3310696Z Written: gen_embedding_backward_adam_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.3311233Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.3311795Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.3312306Z Written: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.3312805Z Written: gen_embedding_optimizer_adam_split_device_kernel.cuh 2025-05-07T19:51:30.3313902Z Written: gen_embedding_backward_split_adam.cpp 2025-05-07T19:51:30.3314271Z Written: gen_embedding_split_adam_pt2_autograd.cpp 2025-05-07T19:51:30.3314739Z Written: gen_embedding_backward_split_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.3315161Z Written: lookup_adam.py 2025-05-07T19:51:30.3315450Z Written: gen_embedding_backward_split_adam_cpu.cpp 2025-05-07T19:51:30.3316080Z Written: gen_embedding_backward_split_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.3316568Z Written: gen_embedding_backward_lamb_split_weighted_cuda.cu 2025-05-07T19:51:30.3317049Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.3317566Z Written: gen_embedding_backward_lamb_split_unweighted_cuda.cu 2025-05-07T19:51:30.3318030Z Written: gen_embedding_backward_lamb_split_weighted_meta.cpp 2025-05-07T19:51:30.3318552Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.3319059Z Written: gen_embedding_backward_lamb_split_unweighted_meta.cpp 2025-05-07T19:51:30.3319585Z Written: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.3320155Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.3320706Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.3321239Z Written: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.3321774Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.3322325Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.3323086Z Written: gen_embedding_optimizer_lamb_split_device_kernel.cuh 2025-05-07T19:51:30.3323551Z Written: gen_embedding_backward_split_lamb.cpp 2025-05-07T19:51:30.3324044Z Written: gen_embedding_split_lamb_pt2_autograd.cpp 2025-05-07T19:51:30.3324497Z Written: gen_embedding_backward_split_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.3324943Z Written: lookup_lamb.py 2025-05-07T19:51:30.3325259Z Written: gen_embedding_backward_split_lamb_cpu.cpp 2025-05-07T19:51:30.3325742Z Written: gen_embedding_backward_split_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.3326246Z Written: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu 2025-05-07T19:51:30.3326803Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.3327368Z Written: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu 2025-05-07T19:51:30.3327873Z Written: gen_embedding_backward_lars_sgd_split_weighted_meta.cpp 2025-05-07T19:51:30.3328433Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.3328972Z Written: gen_embedding_backward_lars_sgd_split_unweighted_meta.cpp 2025-05-07T19:51:30.3329522Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.3330092Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.3330701Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.3331292Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.3331874Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.3332494Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.3333037Z Written: gen_embedding_optimizer_lars_sgd_split_device_kernel.cuh 2025-05-07T19:51:30.3333518Z Written: gen_embedding_backward_split_lars_sgd.cpp 2025-05-07T19:51:30.3333938Z Written: gen_embedding_split_lars_sgd_pt2_autograd.cpp 2025-05-07T19:51:30.3334440Z Written: gen_embedding_backward_split_lars_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.3334900Z Written: lookup_lars_sgd.py 2025-05-07T19:51:30.3335356Z Written: gen_embedding_backward_split_lars_sgd_cpu.cpp 2025-05-07T19:51:30.3335824Z Written: gen_embedding_backward_split_lars_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.3336338Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu 2025-05-07T19:51:30.3337029Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.3337689Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu 2025-05-07T19:51:30.3338274Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_meta.cpp 2025-05-07T19:51:30.3338883Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.3339474Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_meta.cpp 2025-05-07T19:51:30.3340077Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.3340698Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.3341359Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.3341959Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.3342618Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.3343279Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.4164854Z Written: gen_embedding_optimizer_partial_rowwise_adam_split_device_kernel.cuh 2025-05-07T19:51:30.4166618Z Written: gen_embedding_backward_split_partial_rowwise_adam.cpp 2025-05-07T19:51:30.4168439Z Written: gen_embedding_split_partial_rowwise_adam_pt2_autograd.cpp 2025-05-07T19:51:30.4169448Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4169977Z Written: lookup_partial_rowwise_adam.py 2025-05-07T19:51:30.4170419Z Written: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp 2025-05-07T19:51:30.4171019Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.4171657Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu 2025-05-07T19:51:30.4172289Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.4172929Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu 2025-05-07T19:51:30.4173633Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_meta.cpp 2025-05-07T19:51:30.4174220Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.4174804Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_meta.cpp 2025-05-07T19:51:30.4175390Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.4176021Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.4176637Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.4177242Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.4177865Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.4178507Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.4179105Z Written: gen_embedding_optimizer_partial_rowwise_lamb_split_device_kernel.cuh 2025-05-07T19:51:30.4179606Z Written: gen_embedding_backward_split_partial_rowwise_lamb.cpp 2025-05-07T19:51:30.4180081Z Written: gen_embedding_split_partial_rowwise_lamb_pt2_autograd.cpp 2025-05-07T19:51:30.4180611Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4181082Z Written: lookup_partial_rowwise_lamb.py 2025-05-07T19:51:30.4181465Z Written: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp 2025-05-07T19:51:30.4182000Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.4182557Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_cuda.cu 2025-05-07T19:51:30.4183312Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu 2025-05-07T19:51:30.4183843Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_cuda.cu 2025-05-07T19:51:30.4184494Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu 2025-05-07T19:51:30.4185042Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.4185589Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.4186155Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_cuda.cu 2025-05-07T19:51:30.4186712Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:30.4187235Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_cuda.cu 2025-05-07T19:51:30.4187765Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu 2025-05-07T19:51:30.4188288Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_meta.cpp 2025-05-07T19:51:30.4188849Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_meta.cpp 2025-05-07T19:51:30.4189369Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_meta.cpp 2025-05-07T19:51:30.4189905Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_meta.cpp 2025-05-07T19:51:30.4190459Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.4191014Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.4191574Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_meta.cpp 2025-05-07T19:51:30.4192107Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:30.4192644Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_meta.cpp 2025-05-07T19:51:30.4193152Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_meta.cpp 2025-05-07T19:51:30.4193699Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.4194286Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.4194831Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_cta.cu 2025-05-07T19:51:30.4195387Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.4195944Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.4196549Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.4197148Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.4197725Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.4198333Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_cta.cu 2025-05-07T19:51:30.4198905Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.4199513Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.4200103Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.4200709Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_warp.cu 2025-05-07T19:51:30.4201295Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.4201870Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.4202607Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.4203428Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.4204099Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.4204769Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_warp.cu 2025-05-07T19:51:30.4205397Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.4206164Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:51:30.4206881Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_cta.cu 2025-05-07T19:51:30.4207568Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:51:30.4208232Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_cta.cu 2025-05-07T19:51:30.4208918Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:51:30.4209675Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_warp.cu 2025-05-07T19:51:30.4210292Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:51:30.4210948Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_warp.cu 2025-05-07T19:51:30.4211545Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_cuda.cu 2025-05-07T19:51:30.4212137Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_cuda.cu 2025-05-07T19:51:30.4212722Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_cuda.cu 2025-05-07T19:51:30.4213289Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_cuda.cu 2025-05-07T19:51:30.4213851Z Written: gen_embedding_optimizer_rowwise_adagrad_ssd_device_kernel.cuh 2025-05-07T19:51:30.4214374Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:30.4214877Z Written: gen_embedding_backward_ssd_rowwise_adagrad.cpp 2025-05-07T19:51:30.4215301Z Written: gen_embedding_ssd_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:30.4215807Z Written: gen_embedding_backward_ssd_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4216265Z Written: lookup_rowwise_adagrad_ssd.py 2025-05-07T19:51:30.4216627Z Written: gen_embedding_backward_split_rowwise_adagrad.cpp 2025-05-07T19:51:30.4217087Z Written: gen_embedding_split_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:30.4217590Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4218045Z Written: lookup_rowwise_adagrad.py 2025-05-07T19:51:30.4218412Z Written: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp 2025-05-07T19:51:30.4218884Z Written: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp 2025-05-07T19:51:30.4219373Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.4219956Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:30.4220510Z Written: gen_embedding_backward_split_approx_rowwise_adagrad.cpp 2025-05-07T19:51:30.4220994Z Written: gen_embedding_split_approx_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:30.4221571Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4222117Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp 2025-05-07T19:51:30.4222684Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.4223323Z Written: gen_embedding_optimizer_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:51:30.4223930Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:51:30.4224518Z Written: gen_embedding_split_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:51:30.4225141Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4225797Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:51:30.4226417Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.4227132Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:51:30.4227830Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:51:30.4228529Z Written: gen_embedding_split_approx_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:51:30.4229298Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.4229980Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:51:30.5215012Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5217320Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_cuda.cu 2025-05-07T19:51:30.5219240Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu 2025-05-07T19:51:30.5221219Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.5222185Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:30.5222828Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu 2025-05-07T19:51:30.5223511Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_meta.cpp 2025-05-07T19:51:30.5224150Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_meta.cpp 2025-05-07T19:51:30.5224817Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.5225521Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:30.5226172Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_meta.cpp 2025-05-07T19:51:30.5226867Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.5227551Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.5228276Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.5229026Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.5229722Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.5230450Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.5231135Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.5231865Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.5232588Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.5233318Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.5234011Z Written: gen_embedding_optimizer_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:51:30.5234588Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp 2025-05-07T19:51:30.5235153Z Written: gen_embedding_split_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:51:30.5235754Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.5236305Z Written: lookup_rowwise_adagrad_with_counter.py 2025-05-07T19:51:30.5236777Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:51:30.5237357Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5238038Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:51:30.5238658Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp 2025-05-07T19:51:30.5239256Z Written: gen_embedding_split_approx_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:51:30.5239891Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.5240798Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:51:30.5241461Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5242185Z Written: gen_embedding_optimizer_rowwise_weighted_adagrad_split_device_kernel.cuh 2025-05-07T19:51:30.5243066Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp 2025-05-07T19:51:30.5243617Z Written: gen_embedding_split_rowwise_weighted_adagrad_pt2_autograd.cpp 2025-05-07T19:51:30.5244257Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.5244891Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp 2025-05-07T19:51:30.5245487Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5246083Z Written: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu 2025-05-07T19:51:30.5246558Z Written: gen_embedding_backward_sgd_split_weighted_cuda.cu 2025-05-07T19:51:30.5247074Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.5247594Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:30.5248123Z Written: gen_embedding_backward_sgd_split_unweighted_cuda.cu 2025-05-07T19:51:30.5248640Z Written: gen_embedding_backward_sgd_split_weighted_vbe_meta.cpp 2025-05-07T19:51:30.5249131Z Written: gen_embedding_backward_sgd_split_weighted_meta.cpp 2025-05-07T19:51:30.5249653Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.5250171Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:30.5250695Z Written: gen_embedding_backward_sgd_split_unweighted_meta.cpp 2025-05-07T19:51:30.5251205Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.5251750Z Written: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.5252308Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.5252878Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:30.5253442Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.5253977Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.5254527Z Written: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.5255070Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.5255752Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:30.5256281Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.5256751Z Written: gen_embedding_optimizer_sgd_split_device_kernel.cuh 2025-05-07T19:51:30.5257175Z Written: gen_embedding_backward_split_sgd.cpp 2025-05-07T19:51:30.5257534Z Written: gen_embedding_split_sgd_pt2_autograd.cpp 2025-05-07T19:51:30.5257980Z Written: gen_embedding_backward_split_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.5258359Z Written: lookup_sgd.py 2025-05-07T19:51:30.5258668Z Written: gen_embedding_backward_sgd_split_cpu.cpp 2025-05-07T19:51:30.5259040Z Written: gen_embedding_backward_split_sgd_cpu.cpp 2025-05-07T19:51:30.5259478Z Written: gen_embedding_backward_split_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5259973Z Written: gen_embedding_optimizer_approx_sgd_split_device_kernel.cuh 2025-05-07T19:51:30.5260416Z Written: gen_embedding_backward_split_approx_sgd.cpp 2025-05-07T19:51:30.5260849Z Written: gen_embedding_split_approx_sgd_pt2_autograd.cpp 2025-05-07T19:51:30.5261309Z Written: gen_embedding_backward_split_approx_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.5261807Z Written: gen_embedding_backward_split_approx_sgd_cpu.cpp 2025-05-07T19:51:30.5262267Z Written: gen_embedding_backward_split_approx_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5262762Z Written: gen_embedding_backward_none_split_weighted_cuda.cu 2025-05-07T19:51:30.5263246Z Written: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:30.5263832Z Written: gen_embedding_backward_none_split_unweighted_cuda.cu 2025-05-07T19:51:30.5264310Z Written: gen_embedding_backward_none_split_weighted_meta.cpp 2025-05-07T19:51:30.5264840Z Written: gen_embedding_backward_none_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:30.5265341Z Written: gen_embedding_backward_none_split_unweighted_meta.cpp 2025-05-07T19:51:30.5265807Z Written: gen_embedding_backward_none_split_weighted_kernel_cta.cu 2025-05-07T19:51:30.5266337Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:30.5266875Z Written: gen_embedding_backward_none_split_unweighted_kernel_cta.cu 2025-05-07T19:51:30.5267770Z Written: gen_embedding_backward_none_split_weighted_kernel_warp.cu 2025-05-07T19:51:30.5268350Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:30.5268911Z Written: gen_embedding_backward_none_split_unweighted_kernel_warp.cu 2025-05-07T19:51:30.5269449Z Written: gen_embedding_optimizer_none_split_device_kernel.cuh 2025-05-07T19:51:30.5269884Z Written: gen_embedding_backward_split_none.cpp 2025-05-07T19:51:30.5270302Z Written: gen_embedding_split_none_pt2_autograd.cpp 2025-05-07T19:51:30.5270783Z Written: gen_embedding_backward_split_none_pt2_cuda_wrapper.cpp 2025-05-07T19:51:30.5271191Z Written: lookup_none.py 2025-05-07T19:51:30.5271530Z Written: gen_embedding_backward_split_none_cpu.cpp 2025-05-07T19:51:30.5271975Z Written: gen_embedding_backward_split_none_pt2_cpu_wrapper.cpp 2025-05-07T19:51:30.5272525Z Written: gen_embedding_backward_split_weighted_device_kernel_hip.hip 2025-05-07T19:51:30.5273093Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel_hip.hip 2025-05-07T19:51:30.5273707Z Written: gen_embedding_backward_split_unweighted_device_kernel_hip.hip 2025-05-07T19:51:30.5274251Z Written: gen_embedding_backward_ssd_weighted_vbe_device_kernel.cuh 2025-05-07T19:51:30.5274802Z Written: gen_embedding_backward_split_weighted_vbe_device_kernel.cuh 2025-05-07T19:51:30.5372770Z Written: gen_embedding_backward_ssd_weighted_device_kernel.cuh 2025-05-07T19:51:30.5373402Z Written: gen_embedding_backward_split_weighted_device_kernel.cuh 2025-05-07T19:51:30.5374003Z Written: gen_embedding_backward_ssd_unweighted_nobag_device_kernel.cuh 2025-05-07T19:51:30.5374573Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel.cuh 2025-05-07T19:51:30.5375158Z Written: gen_embedding_backward_ssd_unweighted_vbe_device_kernel.cuh 2025-05-07T19:51:30.5375702Z Written: gen_embedding_backward_split_unweighted_vbe_device_kernel.cuh 2025-05-07T19:51:30.5376256Z Written: gen_embedding_backward_ssd_unweighted_device_kernel.cuh 2025-05-07T19:51:30.5376768Z Written: gen_embedding_backward_split_unweighted_device_kernel.cuh 2025-05-07T19:51:30.5377300Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:51:30.5377799Z Written: gen_embedding_backward_split_grad_embedding_ops.cu 2025-05-07T19:51:30.5378312Z Written: gen_embedding_backward_dense_indice_weights_codegen_cuda.cu 2025-05-07T19:51:30.5378854Z Written: gen_embedding_backward_ssd_indice_weights_codegen_cuda.cu 2025-05-07T19:51:30.5379384Z Written: gen_embedding_backward_split_indice_weights_codegen_cuda.cu 2025-05-07T19:51:30.5379840Z Written: pt2_arg_utils.h 2025-05-07T19:51:30.5380109Z Written: __init__.py 2025-05-07T19:51:30.5380397Z Written: lookup_args_ssd.py 2025-05-07T19:51:30.5380674Z Written: lookup_args.py 2025-05-07T19:51:30.5380884Z 2025-05-07T19:51:30.5380889Z 2025-05-07T19:51:30.5381014Z ================================================================================ 2025-05-07T19:51:30.5381399Z Running code generation script ... 2025-05-07T19:51:30.5382203Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py --opensource 2025-05-07T19:51:30.5383048Z ================================================================================ 2025-05-07T19:51:30.5383470Z 2025-05-07T19:51:30.6359797Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:30.6363002Z [GENERATE OPTIMIZERS]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py', '--opensource'] 2025-05-07T19:51:30.6365219Z Written: gen_embedding_optimizer_rowwise_adagrad_split_cuda.cu 2025-05-07T19:51:30.6366626Z Written: gen_embedding_optimizer_rowwise_adagrad_split_kernel.cu 2025-05-07T19:51:30.6368390Z Written: gen_embedding_optimizer_rowwise_adagrad_split.cpp 2025-05-07T19:51:30.6369874Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:30.6371295Z Written: split_embedding_optimizer_rowwise_adagrad.py 2025-05-07T19:51:30.6372382Z Written: optimizer_args.py 2025-05-07T19:51:30.6444944Z 2025-05-07T19:51:30.6445046Z 2025-05-07T19:51:30.6445543Z ================================================================================ 2025-05-07T19:51:30.6446625Z Running code generation script ... 2025-05-07T19:51:30.6448957Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py --opensource 2025-05-07T19:51:30.6450817Z ================================================================================ 2025-05-07T19:51:30.6451194Z 2025-05-07T19:51:30.7589282Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:30.7590846Z [GENERATE FORWARD QUANTIZED]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py', '--opensource'] 2025-05-07T19:51:30.7591747Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu 2025-05-07T19:51:30.7592446Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu 2025-05-07T19:51:30.7593150Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu 2025-05-07T19:51:30.7593827Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu 2025-05-07T19:51:30.7594545Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu 2025-05-07T19:51:30.7595251Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu 2025-05-07T19:51:30.7596179Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu 2025-05-07T19:51:30.7596894Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu 2025-05-07T19:51:30.7597583Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu 2025-05-07T19:51:30.7598300Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu 2025-05-07T19:51:30.7599012Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu 2025-05-07T19:51:30.7599701Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu 2025-05-07T19:51:30.7600410Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu 2025-05-07T19:51:30.7601054Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu 2025-05-07T19:51:30.7601735Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu 2025-05-07T19:51:30.7602382Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu 2025-05-07T19:51:30.7603397Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu 2025-05-07T19:51:30.7604134Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu 2025-05-07T19:51:30.7604794Z Written: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu 2025-05-07T19:51:30.7605490Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu 2025-05-07T19:51:30.7606162Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu 2025-05-07T19:51:30.7607035Z Written: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp 2025-05-07T19:51:30.7607588Z Written: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp 2025-05-07T19:51:30.7675587Z 2025-05-07T19:51:30.7675787Z 2025-05-07T19:51:30.7676607Z ================================================================================ 2025-05-07T19:51:30.7677696Z Running code generation script ... 2025-05-07T19:51:30.7679075Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py --opensource 2025-05-07T19:51:30.7679883Z ================================================================================ 2025-05-07T19:51:30.7680123Z 2025-05-07T19:51:31.1126198Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:31.1128739Z [GENERATE FORWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py', '--opensource'] 2025-05-07T19:51:31.1130923Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:31.1132315Z Written: gen_embedding_forward_dense_weighted_codegen_cuda.cu 2025-05-07T19:51:31.1133191Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:31.1133656Z Written: gen_embedding_forward_dense_unweighted_codegen_cuda.cu 2025-05-07T19:51:31.1134132Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:31.1134592Z Written: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:31.1135058Z Written: gen_embedding_forward_ssd_weighted_codegen_cuda.cu 2025-05-07T19:51:31.1135505Z Written: gen_embedding_forward_split_weighted_codegen_cuda.cu 2025-05-07T19:51:31.1135953Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:31.1136471Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:31.1136944Z Written: gen_embedding_forward_ssd_unweighted_codegen_cuda.cu 2025-05-07T19:51:31.1137426Z Written: gen_embedding_forward_split_unweighted_codegen_cuda.cu 2025-05-07T19:51:31.1137911Z Written: gen_embedding_forward_split_weighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:51:31.1138433Z Written: gen_embedding_forward_split_weighted_gwd_codegen_cuda.cu 2025-05-07T19:51:31.1138928Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:51:31.1139462Z Written: gen_embedding_forward_split_unweighted_gwd_codegen_cuda.cu 2025-05-07T19:51:31.1139946Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:31.1140443Z Written: gen_embedding_forward_dense_weighted_codegen_meta.cpp 2025-05-07T19:51:31.1140951Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:31.1141432Z Written: gen_embedding_forward_dense_unweighted_codegen_meta.cpp 2025-05-07T19:51:31.1141921Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:31.1142395Z Written: gen_embedding_forward_split_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:31.1142882Z Written: gen_embedding_forward_ssd_weighted_codegen_meta.cpp 2025-05-07T19:51:31.1143328Z Written: gen_embedding_forward_split_weighted_codegen_meta.cpp 2025-05-07T19:51:31.1143819Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:31.1144335Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:31.1144809Z Written: gen_embedding_forward_ssd_unweighted_codegen_meta.cpp 2025-05-07T19:51:31.1145295Z Written: gen_embedding_forward_split_unweighted_codegen_meta.cpp 2025-05-07T19:51:31.1145738Z Written: gen_embedding_forward_dense_weighted_vbe_kernel.cu 2025-05-07T19:51:31.1146173Z Written: gen_embedding_forward_dense_weighted_kernel.cu 2025-05-07T19:51:31.1146595Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel.cu 2025-05-07T19:51:31.1147053Z Written: gen_embedding_forward_dense_unweighted_vbe_kernel.cu 2025-05-07T19:51:31.1147493Z Written: gen_embedding_forward_dense_unweighted_kernel.cu 2025-05-07T19:51:31.1148158Z Written: gen_embedding_forward_ssd_weighted_vbe_kernel.cu 2025-05-07T19:51:31.1148595Z Written: gen_embedding_forward_split_weighted_vbe_kernel.cu 2025-05-07T19:51:31.1149100Z Written: gen_embedding_forward_ssd_weighted_kernel.cu 2025-05-07T19:51:31.1149521Z Written: gen_embedding_forward_split_weighted_kernel.cu 2025-05-07T19:51:31.1149932Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel.cu 2025-05-07T19:51:31.1150410Z Written: gen_embedding_forward_split_unweighted_nobag_kernel.cu 2025-05-07T19:51:31.1150875Z Written: gen_embedding_forward_ssd_unweighted_vbe_kernel.cu 2025-05-07T19:51:31.1151319Z Written: gen_embedding_forward_split_unweighted_vbe_kernel.cu 2025-05-07T19:51:31.1151764Z Written: gen_embedding_forward_ssd_unweighted_kernel.cu 2025-05-07T19:51:31.1152179Z Written: gen_embedding_forward_split_unweighted_kernel.cu 2025-05-07T19:51:31.1152646Z Written: gen_embedding_forward_split_weighted_vbe_gwd_kernel.cu 2025-05-07T19:51:31.1153095Z Written: gen_embedding_forward_split_weighted_gwd_kernel.cu 2025-05-07T19:51:31.1153565Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_kernel.cu 2025-05-07T19:51:31.1154047Z Written: gen_embedding_forward_split_unweighted_gwd_kernel.cu 2025-05-07T19:51:31.1154483Z Written: gen_embedding_forward_split_weighted_v2_kernel.cu 2025-05-07T19:51:31.1154926Z Written: gen_embedding_forward_split_unweighted_v2_kernel.cu 2025-05-07T19:51:31.1155392Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:31.1155901Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:31.1156377Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:31.1156867Z Written: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:31.1157333Z Written: gen_embedding_forward_split_pt2_cuda_wrapper.cpp 2025-05-07T19:51:31.1157734Z Written: gen_embedding_forward_split_pt2_cpu_wrapper.cpp 2025-05-07T19:51:31.1158146Z Written: gen_embedding_forward_ssd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:31.1243798Z 2025-05-07T19:51:31.1243894Z 2025-05-07T19:51:31.1244435Z ================================================================================ 2025-05-07T19:51:31.1245582Z Running code generation script ... 2025-05-07T19:51:31.1247787Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py --opensource 2025-05-07T19:51:31.1250041Z ================================================================================ 2025-05-07T19:51:31.1250710Z 2025-05-07T19:51:31.3845630Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:31.3848112Z [INDEX SELECT GENERATOR]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py', '--opensource'] 2025-05-07T19:51:31.3850146Z Written: gen_batch_index_select_dim0_forward_codegen_cuda.cu 2025-05-07T19:51:31.3851395Z Written: gen_batch_index_select_dim0_forward_kernel.cu 2025-05-07T19:51:31.3852633Z Written: gen_batch_index_select_dim0_forward_kernel_small.cu 2025-05-07T19:51:31.3853063Z Written: gen_batch_index_select_dim0_backward_codegen_cuda.cu 2025-05-07T19:51:31.3853517Z Written: gen_batch_index_select_dim0_backward_kernel_cta.cu 2025-05-07T19:51:31.3853943Z Written: gen_batch_index_select_dim0_backward_kernel_warp.cu 2025-05-07T19:51:31.3854440Z Written: gen_embedding_backward_split_batch_index_select_device_kernel.cuh 2025-05-07T19:51:31.3854936Z Written: gen_embedding_backward_split_grad_index_select.cu 2025-05-07T19:51:31.3855381Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:51:31.4029070Z 2025-05-07T19:51:31.4029405Z 2025-05-07T19:51:31.4029736Z ================================================================================ 2025-05-07T19:51:31.4030388Z GPU CPP Library Target: fbgemm_gpu_experimental_gen_ai (SHARED) 2025-05-07T19:51:31.4030832Z 2025-05-07T19:51:31.4031045Z CPU_SRCS: 2025-05-07T19:51:31.4031678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:31.4032319Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:31.4033036Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:31.4033605Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:31.4034234Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:31.4034914Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:31.4035528Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:31.4035963Z 2025-05-07T19:51:31.4036173Z GPU_SRCS: 2025-05-07T19:51:31.4036587Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:51:31.4037206Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:51:31.4037816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:51:31.4038352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:51:31.4038959Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:51:31.4039603Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:51:31.4040206Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:51:31.4040642Z 2025-05-07T19:51:31.4040849Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:31.4041386Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:51:31.4042188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:51:31.4043196Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:51:31.4044112Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:51:31.4044971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:51:31.4045837Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:51:31.4046785Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:51:31.4047750Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:51:31.4048706Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:51:31.4049643Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:51:31.4050572Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:51:31.4051516Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:51:31.4052474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:51:31.4053428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:51:31.4054360Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:51:31.4055317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:51:31.4056258Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:51:31.4057426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:51:31.4058452Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:51:31.4059325Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:51:31.4060218Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:51:31.4061089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:51:31.4061974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:51:31.4062859Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:51:31.4063730Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:51:31.4064630Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:51:31.4065495Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:51:31.4066390Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:51:31.4067858Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:51:31.4068696Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:51:31.4069502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:51:31.4070348Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:51:31.4071141Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:51:31.4071957Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:51:31.4072934Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:51:31.4074096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:51:31.4075252Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:51:31.4076386Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:51:31.4077634Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:51:31.4079019Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:51:31.4080325Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:51:31.4081464Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:51:31.4082665Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:51:31.4083981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:51:31.4085529Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:51:31.4086867Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:51:31.4088011Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:51:31.4089254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:51:31.4090332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:51:31.4091154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:51:31.4091918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:51:31.4092692Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:51:31.4093488Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:51:31.4094266Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:51:31.4094988Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:51:31.4095736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:51:31.4096454Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:51:31.4097131Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:51:31.4097856Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:51:31.4098572Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:51:31.4099251Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:51:31.4099943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:51:31.4100408Z 2025-05-07T19:51:31.4100605Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:31.4100965Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/ck_extensions.hip 2025-05-07T19:51:31.4101620Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/gemm.cpp 2025-05-07T19:51:31.4102306Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/bf16_grouped_gemm.hip 2025-05-07T19:51:31.4103411Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x128_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4104819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4106202Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4107569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:31.4108951Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:31.4110331Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:31.4111817Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4113204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4114592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4115963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4117333Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x64_16x16_1x3_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4118714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x16x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4120074Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4121449Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4123100Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x96x128_16x16_2x3_16x8x1_16x8x1_1x32x1x4_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:51:31.4124607Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x128x64_32x32_2x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4126101Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x96x64_16x16_4x3_8x16x1_8x16x1_1x32x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4127583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x128_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4129095Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4130605Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4132092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x224x64_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4133590Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x256x64_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4135093Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x96x64_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4136597Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4138003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:31.4139529Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x64x128_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4140910Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x224x256x32_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4142301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x128x32_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4143686Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x160x64_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4145067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x192x64_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4146465Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x224x64_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4147845Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x256x64_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4149245Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x128x128_16x16_1x4_16x16x1_16x16x1_1x32x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:31.4150624Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x224x64_16x16_1x7_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4152005Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4153380Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4154764Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x128x128_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4156160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x192x128_16x16_4x3_16x16x1_16x16x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4157544Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x96x64_16x16_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4158918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4160293Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4161656Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x64_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4163269Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x32x128_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:31.4164759Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x48x128_16x16_1x3_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4166416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x64x128_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:31.4167705Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/ck_utility.hip 2025-05-07T19:51:31.4168470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip 2025-05-07T19:51:31.4169302Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/fp8_rowwise_gemm.hip 2025-05-07T19:51:31.4170694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x16x128_16x16_4x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4172167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x32x128_32x32_2x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4173626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4175116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_4_split_k.hip 2025-05-07T19:51:31.4176664Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_8_split_k.hip 2025-05-07T19:51:31.4178152Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4179668Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4181065Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:51:31.4182446Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4183798Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4185188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2_2_split_k.hip 2025-05-07T19:51:31.4186547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4187940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2_2_split_k.hip 2025-05-07T19:51:31.4189312Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4190657Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4192005Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4193472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4194874Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x256_16x16_1x1_16x8x1_16x8x1_1x32x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4196218Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4197567Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4198905Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4200263Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4201618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4203219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4204694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_16x16_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4206173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4207656Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4209138Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:51:31.4210619Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4212100Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x64_32x32_2x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:31.4213593Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_16x16_4x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4215076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4216578Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4217942Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4219304Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4220682Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x256_32x32_2x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4222175Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4223529Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4224907Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x128x128_16x16_5x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4226285Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x256x128_16x16_5x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4227655Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x96x128_16x16_5x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4229062Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x128_16x16_1x1_16x16x1_8x32x1_1x16x1x16_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:51:31.4230469Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4231821Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4233190Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x128x128_16x16_6x4_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:51:31.4234576Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x192x128_16x16_6x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4235934Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x224x128_16x16_6x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4237307Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4238672Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:51:31.4240028Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x160x128_16x16_7x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4241409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x192x128_16x16_7x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4242823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4244471Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4245944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_32x32_4x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4247417Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4249052Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4250526Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4252005Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4253482Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4254978Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_16x16_8x8_4x64x1_4x64x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4256493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_32x32_4x4_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:31.4257848Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_16x16_8x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4259206Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_32x32_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4260572Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4261948Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4263323Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x128_32x32_1x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4264686Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4266059Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x16x512_16x16_1x1_32x8x1_32x8x1_1x64x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4267917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x128_32x32_1x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4269404Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4271070Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x256x128_32x32_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4272563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4274024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4275513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x96x256_16x16_2x3_16x16x1_16x16x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4277184Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x80x128x256_16x16_5x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4278665Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x96x128x128_16x16_3x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4280304Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4281658Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4283238Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4284706Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x4x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4286167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4287619Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4289085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x64_16x16_1x1_4x16x1_4x16x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4290338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/fp8_rowwise_batched_gemm.hip 2025-05-07T19:51:31.4291656Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4293261Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4294880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4296491Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4297983Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4299474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4300935Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4302418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4303896Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4305925Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4307402Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4308898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:31.4310389Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:51:31.4311899Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4313400Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4314891Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4316377Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4317875Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4319359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4320849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4322344Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4324160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4325776Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4327396Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4329003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4330606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4332216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4333944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4335636Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4337131Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4338638Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4340105Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4341567Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4343021Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4344461Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v1.hip 2025-05-07T19:51:31.4345886Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v2.hip 2025-05-07T19:51:31.4347070Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/fp8_rowwise_grouped_gemm.hip 2025-05-07T19:51:31.4348261Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4349731Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4351194Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4352665Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4354127Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4355601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4357087Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:31.4358554Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:31.4360121Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4361610Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x96x256_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4363331Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4364908Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_16x16_1x4_16x8x1_16x8x1_1x32x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4366513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4368245Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4370013Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_2x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4371623Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4373240Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4374867Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4376516Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4378007Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x256x128_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4379504Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4380979Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4382496Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:31.4383997Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4385482Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4386975Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4388647Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4390133Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4391616Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x192x96x128_16x16_6x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4393104Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:31.4394599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x128x64_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:31.4396085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4397583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4399066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4400556Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4402055Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4403807Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x128x128_16x16_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:31.4405418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4407026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4408627Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x256x128_16x16_1x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4410219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4411818Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:31.4413404Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x64x512_16x16_2x1_32x8x1_32x8x1_1x32x1x8_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:51:31.4414991Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4416714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:31.4418191Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x160x128_16x16_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4419673Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x192x128_16x16_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:31.4421137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4422584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:31.4424058Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:31.4425513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x32x256_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4426964Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:31.4428435Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:31.4429541Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip 2025-05-07T19:51:31.4430037Z 2025-05-07T19:51:31.4430210Z OTHER_SRCS: 2025-05-07T19:51:31.4430314Z 2025-05-07T19:51:31.4430381Z 2025-05-07T19:51:31.4430551Z CC_FLAGS: 2025-05-07T19:51:31.4430657Z 2025-05-07T19:51:31.4430722Z 2025-05-07T19:51:31.4430898Z NVCC_FLAGS: 2025-05-07T19:51:31.4431015Z 2025-05-07T19:51:31.4431096Z 2025-05-07T19:51:31.4431307Z HIPCC_FLAGS: 2025-05-07T19:51:31.4431427Z 2025-05-07T19:51:31.4431529Z 2025-05-07T19:51:31.4431711Z INCLUDE_DIRS: 2025-05-07T19:51:31.4431967Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:31.4432280Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:31.4432584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:31.4432888Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:31.4433401Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include 2025-05-07T19:51:31.4434150Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:31.4434802Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:31.4435226Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:31.4435643Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:31.4436130Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:31.4436633Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:31.4437110Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:31.4437648Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include 2025-05-07T19:51:31.4438259Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize 2025-05-07T19:51:31.4438686Z 2025-05-07T19:51:31.4438914Z Selected Source Files: 2025-05-07T19:51:31.4439328Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:31.4439954Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:31.4440543Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:31.4441068Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:31.4441655Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:31.4442263Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:31.4443090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:31.4443749Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:51:31.4444370Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:51:31.4444988Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:51:31.4445540Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:51:31.4446162Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:51:31.4446827Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:51:31.4447416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:51:31.4448156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:51:31.4448967Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:51:31.4449822Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:51:31.4450736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:51:31.4451590Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:51:31.4452467Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:51:31.4453422Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:51:31.4454393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:51:31.4455471Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:51:31.4456351Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:51:31.4457264Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:51:31.4458155Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:51:31.4459072Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:51:31.4459981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:51:31.4460864Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:51:31.4461766Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:51:31.4462652Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:51:31.4463551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:51:31.4464521Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:51:31.4465457Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:51:31.4466379Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:51:31.4467573Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:51:31.4468538Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:51:31.4469505Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:51:31.4470592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:51:31.4471557Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:51:31.4472485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:51:31.4473442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:51:31.4474397Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:51:31.4475223Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:51:31.4476011Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:51:31.4476813Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:51:31.4477615Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:51:31.4478417Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:51:31.4479378Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:51:31.4480579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:51:31.4481661Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:51:31.4482769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:51:31.4484100Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:51:31.4485247Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:51:31.4486403Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:51:31.4487570Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:51:31.4488704Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:51:31.4490036Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:51:31.4491478Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:51:31.4492886Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:51:31.4494049Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:51:31.4495282Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:51:31.4496231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:51:31.4497058Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:51:31.4497824Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:51:31.4498609Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:51:31.4499404Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:51:31.4500194Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:51:31.4500940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:51:31.4501702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:51:31.4502638Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:51:31.4503361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:51:31.4504143Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:51:31.4505088Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:51:31.4505843Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:51:31.4506621Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:51:31.4507128Z 2025-05-07T19:51:31.4507362Z HIPified Source Files: 2025-05-07T19:51:31.4507527Z 2025-05-07T19:51:31.4507612Z 2025-05-07T19:51:31.4507839Z Library Dependencies: 2025-05-07T19:51:31.4508101Z torch 2025-05-07T19:51:31.4508306Z torch_library 2025-05-07T19:51:31.4508774Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so 2025-05-07T19:51:31.4509457Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:31.4510151Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:31.4510933Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:31.4511668Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:31.4512274Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:31.4512667Z 2025-05-07T19:51:31.4512862Z Output Library: 2025-05-07T19:51:31.4513097Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:51:31.4513359Z 2025-05-07T19:51:31.4513545Z Destination Directory: 2025-05-07T19:51:31.4513710Z 2025-05-07T19:51:31.4513826Z ================================================================================ 2025-05-07T19:51:31.4514057Z 2025-05-07T19:51:31.4514061Z 2025-05-07T19:51:31.4514065Z 2025-05-07T19:51:31.4514187Z ================================================================================ 2025-05-07T19:51:31.4514552Z Adding to Package: fbgemm_gpu/experimental/gen_ai 2025-05-07T19:51:31.4514984Z 2025-05-07T19:51:31.4515232Z TARGETS: 2025-05-07T19:51:31.4515447Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:51:31.4515698Z 2025-05-07T19:51:31.4515879Z FILES: 2025-05-07T19:51:31.4515984Z 2025-05-07T19:51:31.4516143Z ================================================================================ 2025-05-07T19:51:31.4516379Z 2025-05-07T19:51:31.4516383Z 2025-05-07T19:51:31.4516387Z 2025-05-07T19:51:31.4516494Z ================================================================================ 2025-05-07T19:51:31.4516911Z GPU CPP Library Target: fbgemm_gpu_experimental_example_py (SHARED) 2025-05-07T19:51:31.4517384Z 2025-05-07T19:51:31.4517559Z CPU_SRCS: 2025-05-07T19:51:31.4517662Z 2025-05-07T19:51:31.4517730Z 2025-05-07T19:51:31.4517903Z GPU_SRCS: 2025-05-07T19:51:31.4518203Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:51:31.4518715Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:51:31.4519228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:51:31.4519606Z 2025-05-07T19:51:31.4519773Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:31.4519894Z 2025-05-07T19:51:31.4519959Z 2025-05-07T19:51:31.4520133Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:31.4520295Z 2025-05-07T19:51:31.4520360Z 2025-05-07T19:51:31.4520522Z OTHER_SRCS: 2025-05-07T19:51:31.4520630Z 2025-05-07T19:51:31.4520864Z 2025-05-07T19:51:31.4521039Z CC_FLAGS: 2025-05-07T19:51:31.4521144Z 2025-05-07T19:51:31.4521215Z 2025-05-07T19:51:31.4521392Z NVCC_FLAGS: 2025-05-07T19:51:31.4521500Z 2025-05-07T19:51:31.4521586Z 2025-05-07T19:51:31.4521747Z HIPCC_FLAGS: 2025-05-07T19:51:31.4521862Z 2025-05-07T19:51:31.4521944Z 2025-05-07T19:51:31.4522106Z INCLUDE_DIRS: 2025-05-07T19:51:31.4522328Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:31.4522691Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:31.4523147Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:31.4523434Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:31.4523915Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include 2025-05-07T19:51:31.4524688Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:31.4525319Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:31.4525730Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:31.4526140Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:31.4526604Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:31.4527101Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:31.4527558Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:31.4528109Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include 2025-05-07T19:51:31.4528590Z 2025-05-07T19:51:31.4528782Z Selected Source Files: 2025-05-07T19:51:31.4529147Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:51:31.4529690Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:51:31.4530232Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:51:31.4530651Z 2025-05-07T19:51:31.4530829Z HIPified Source Files: 2025-05-07T19:51:31.4530983Z 2025-05-07T19:51:31.4531054Z 2025-05-07T19:51:31.4531242Z Library Dependencies: 2025-05-07T19:51:31.4531454Z torch 2025-05-07T19:51:31.4531634Z torch_library 2025-05-07T19:51:31.4532049Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so 2025-05-07T19:51:31.4532713Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:31.4533387Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:31.4534166Z /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:31.4534883Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:31.4535751Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:31.4536113Z 2025-05-07T19:51:31.4536332Z Output Library: 2025-05-07T19:51:31.4536552Z fbgemm_gpu_experimental_example_py 2025-05-07T19:51:31.4536788Z 2025-05-07T19:51:31.4536963Z Destination Directory: 2025-05-07T19:51:31.4537101Z 2025-05-07T19:51:31.4537200Z ================================================================================ 2025-05-07T19:51:31.4537412Z 2025-05-07T19:51:31.4537416Z 2025-05-07T19:51:31.4537420Z 2025-05-07T19:51:31.4537519Z ================================================================================ 2025-05-07T19:51:31.4537855Z Adding to Package: fbgemm_gpu/experimental/example 2025-05-07T19:51:31.4538141Z 2025-05-07T19:51:31.4538300Z TARGETS: 2025-05-07T19:51:31.4538485Z fbgemm_gpu_experimental_example_py 2025-05-07T19:51:31.4538721Z 2025-05-07T19:51:31.4538866Z FILES: 2025-05-07T19:51:31.4539157Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/__init__.py 2025-05-07T19:51:31.4539640Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/utils.py 2025-05-07T19:51:31.4540036Z ================================================================================ 2025-05-07T19:51:31.4540244Z 2025-05-07T19:51:31.4540248Z 2025-05-07T19:51:31.4540251Z 2025-05-07T19:51:31.4540355Z ================================================================================ 2025-05-07T19:51:31.4540714Z Adding to Package: fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T19:51:31.4541037Z 2025-05-07T19:51:31.4541197Z TARGETS: 2025-05-07T19:51:31.4541302Z 2025-05-07T19:51:31.4541368Z 2025-05-07T19:51:31.4541515Z FILES: 2025-05-07T19:51:31.4541816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T19:51:31.4542318Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T19:51:31.4542827Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T19:51:31.4543374Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T19:51:31.4543890Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T19:51:31.4544277Z ================================================================================ 2025-05-07T19:51:31.4544481Z 2025-05-07T19:51:31.4544571Z -- Configuring done (8.8s) 2025-05-07T19:51:31.4544828Z -- Generating done (0.0s) 2025-05-07T19:51:31.4545273Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build 2025-05-07T19:51:31.4652375Z Change Dir: '/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build' 2025-05-07T19:51:31.4653499Z 2025-05-07T19:51:31.4653795Z Run Build Command(s): /github/home/miniconda/envs/build_binary/bin/ninja -v -j 48 install 2025-05-07T19:51:31.5746871Z [1/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:51:31.5759288Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.5983517Z [2/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:51:31.5995038Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6007854Z [3/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:51:31.6019316Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6064606Z [4/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:51:31.6076911Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6114549Z [5/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:51:31.6126450Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6225575Z [6/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:51:31.6237384Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6252997Z [7/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:51:31.6264054Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6327608Z [8/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:51:31.6339037Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6350186Z [9/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:51:31.6361183Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6518962Z [10/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:51:31.6531449Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6623531Z [11/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:51:31.6634966Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6816179Z [12/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:51:31.6827587Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.6909325Z [13/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:51:31.6921348Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7045613Z [14/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:51:31.7057425Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7068777Z [15/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:51:31.7080511Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7096033Z [16/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:51:31.7107335Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7169096Z [17/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:51:31.7180883Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7251733Z [18/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:51:31.7263511Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7338316Z [19/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:51:31.7350025Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7645207Z [20/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:51:31.7657147Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7667970Z [21/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:51:31.7679050Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.7988388Z [22/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:51:31.8000560Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8012499Z [23/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:51:31.8024836Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8094612Z [24/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:51:31.8107112Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8118301Z [25/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:51:31.8129969Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8272916Z [26/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:51:31.8284643Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8497755Z [27/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:51:31.8510009Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8562981Z [28/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:51:31.8574755Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8671038Z [29/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:51:31.8682805Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8722811Z [30/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:51:31.8734479Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8774141Z [31/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:51:31.8785535Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8841904Z [32/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:51:31.8853588Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.8954943Z [33/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:51:31.8967472Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9121949Z [34/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:51:31.9133892Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9214443Z [35/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:51:31.9226360Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9306326Z [36/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:51:31.9318636Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9381488Z [37/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:51:31.9387960Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9521056Z [38/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:51:31.9531956Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9559324Z [39/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:51:31.9571431Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9645543Z [40/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:51:31.9656072Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.9858171Z [41/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:51:31.9870736Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.0045412Z [42/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:51:32.0058143Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.0070340Z [43/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:51:32.0082983Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.0336504Z [44/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:51:32.0349377Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.1045109Z [45/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:51:32.1057690Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.1399246Z [46/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:51:32.1412242Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.1658375Z [47/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:51:32.1671793Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.1962954Z [48/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:51:32.1975808Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.2241661Z [49/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:51:32.2254318Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.2266825Z [50/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:51:32.2279896Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.3277573Z [51/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:51:32.3290187Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.3669106Z [52/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:51:32.3681380Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.3702510Z [53/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:51:32.3715323Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.5104008Z [54/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:51:32.5116190Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.5952556Z [55/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:51:32.5964802Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.6369628Z [56/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:51:32.6376147Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.6892732Z [57/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:51:32.6911750Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.6930842Z [58/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:32.6949763Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.8736311Z [59/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:51:32.8748944Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.9189547Z [60/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:51:32.9202912Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.0301418Z [61/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtils.cc 2025-05-07T19:51:33.0320004Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.2538009Z [62/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:51:33.2549364Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.8937954Z [63/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,asmjit.so -o asmjit.so CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so" -Wl,--as-needed && : 2025-05-07T19:51:33.9006532Z [64/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T19:51:33.9008423Z ################################################################################ 2025-05-07T19:51:33.9009050Z [CMAKE] Running post-build script ... 2025-05-07T19:51:33.9009972Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T19:51:33.9010924Z Removing all RPATHs ... 2025-05-07T19:51:33.9011427Z ################################################################################ 2025-05-07T19:51:34.1278018Z [65/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -c /__w/FBGEMM/FBGEMM/src/Utils.cc 2025-05-07T19:51:34.1295652Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:34.5448339Z [66/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -c /__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc 2025-05-07T19:51:34.5463820Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:37.8417479Z [67/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -c /__w/FBGEMM/FBGEMM/src/RefImplementations.cc 2025-05-07T19:51:37.8436538Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:38.1729730Z [68/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -c /__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:38.1749044Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:39.9671015Z [69/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:39.9687185Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:40.5482983Z [70/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:40.5502662Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:40.5908087Z [71/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:40.5926909Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:40.6115983Z [72/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:40.6135876Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:40.6521914Z [73/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:40.6540766Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:42.2614985Z [74/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:42.2633427Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:42.8369539Z [75/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:42.8387394Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:42.9498281Z [76/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:42.9517686Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:43.2774783Z [77/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc 2025-05-07T19:51:43.2792459Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:57.1374954Z [78/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc 2025-05-07T19:51:57.1390786Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:57:03.1818659Z [79/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o 2025-05-07T19:57:03.2176114Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:05.1251747Z [80/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o 2025-05-07T19:57:05.1263824Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:05.1876177Z [81/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o 2025-05-07T19:57:05.1900281Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:06.0841385Z [82/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o 2025-05-07T19:57:06.0863200Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:08.3902966Z [83/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o 2025-05-07T19:57:08.3917158Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:13.2171946Z [84/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o 2025-05-07T19:57:13.2183822Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:15.7667984Z [85/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc 2025-05-07T19:57:15.7677635Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:57:19.6155979Z [86/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm.so -o fbgemm.so CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,"\$ORIGIN" /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so asmjit.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so && : 2025-05-07T19:57:20.0910987Z [87/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 1 2025-05-07T19:57:20.0912247Z ################################################################################ 2025-05-07T19:57:20.0912657Z [CMAKE] Running post-build script ... 2025-05-07T19:57:20.0913202Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T19:57:20.0913770Z Resetting RPATH to $ORIGIN ... 2025-05-07T19:57:20.0914189Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T19:57:20.0914624Z ################################################################################ 2025-05-07T19:57:23.7868834Z [88/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o 2025-05-07T19:57:23.7882196Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:23.7883910Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7884895Z static auto dtype() { 2025-05-07T19:57:23.7885201Z ^ 2025-05-07T19:57:23.7885581Z 2025-05-07T19:57:23.7885840Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:23.7886242Z 2025-05-07T19:57:23.7887143Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7888169Z static auto dtype() { 2025-05-07T19:57:23.7888437Z ^ 2025-05-07T19:57:23.7888619Z 2025-05-07T19:57:23.7889441Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7890503Z static auto dtype() { 2025-05-07T19:57:23.7890774Z ^ 2025-05-07T19:57:23.7890919Z 2025-05-07T19:57:23.7891721Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7892701Z static auto dtype() { 2025-05-07T19:57:23.7892999Z ^ 2025-05-07T19:57:23.7893140Z 2025-05-07T19:57:23.7893400Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:23.7893795Z 2025-05-07T19:57:23.7894581Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7895632Z static auto dtype() { 2025-05-07T19:57:23.7895898Z ^ 2025-05-07T19:57:23.7896065Z 2025-05-07T19:57:23.7896882Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7897932Z static auto dtype() { 2025-05-07T19:57:23.7898202Z ^ 2025-05-07T19:57:23.7898341Z 2025-05-07T19:57:23.7899132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7900111Z static auto dtype() { 2025-05-07T19:57:23.7900416Z ^ 2025-05-07T19:57:23.7900565Z 2025-05-07T19:57:23.7900817Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:23.7901218Z 2025-05-07T19:57:23.7901992Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7903025Z static auto dtype() { 2025-05-07T19:57:23.7903290Z ^ 2025-05-07T19:57:23.7903466Z 2025-05-07T19:57:23.7904281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7905356Z static auto dtype() { 2025-05-07T19:57:23.7905620Z ^ 2025-05-07T19:57:23.7905791Z 2025-05-07T19:57:23.7906559Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7907539Z static auto dtype() { 2025-05-07T19:57:23.7907841Z ^ 2025-05-07T19:57:23.7907979Z 2025-05-07T19:57:23.7908262Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:23.7908632Z 2025-05-07T19:57:23.7909406Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7910491Z static auto dtype() { 2025-05-07T19:57:23.7910766Z ^ 2025-05-07T19:57:23.7910945Z 2025-05-07T19:57:23.7911820Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7912880Z static auto dtype() { 2025-05-07T19:57:23.7913144Z ^ 2025-05-07T19:57:23.7913317Z 2025-05-07T19:57:23.7914085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7915112Z static auto dtype() { 2025-05-07T19:57:23.7915407Z ^ 2025-05-07T19:57:23.7915551Z 2025-05-07T19:57:23.7915826Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:23.7916187Z 2025-05-07T19:57:23.7916945Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7917927Z static auto dtype() { 2025-05-07T19:57:23.7918169Z ^ 2025-05-07T19:57:23.7918321Z 2025-05-07T19:57:23.7919212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7975322Z static auto dtype() { 2025-05-07T19:57:23.7975865Z ^ 2025-05-07T19:57:23.7976078Z 2025-05-07T19:57:23.7977468Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7978720Z static auto dtype() { 2025-05-07T19:57:23.7978987Z ^ 2025-05-07T19:57:23.7979118Z 2025-05-07T19:57:23.7979403Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:23.7979760Z 2025-05-07T19:57:23.7980530Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7981507Z static auto dtype() { 2025-05-07T19:57:23.7981753Z ^ 2025-05-07T19:57:23.7981906Z 2025-05-07T19:57:23.7982719Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:23.7983747Z static auto dtype() { 2025-05-07T19:57:23.7983990Z ^ 2025-05-07T19:57:23.7984147Z 2025-05-07T19:57:56.1393826Z [89/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o 2025-05-07T19:57:56.1406642Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:56.1408218Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:56.1409408Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:56.1409873Z ^ 2025-05-07T19:57:56.1410087Z 2025-05-07T19:57:56.1410353Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:56.1410720Z 2025-05-07T19:57:56.1411566Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:56.1412729Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:56.1413206Z ^ 2025-05-07T19:57:56.1413384Z 2025-05-07T19:57:56.7538017Z [90/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o 2025-05-07T19:57:56.7557170Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:59.7562018Z [91/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o 2025-05-07T19:57:59.7582140Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:59.7584595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:59.7586346Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:59.7587026Z ^ 2025-05-07T19:57:59.7587288Z 2025-05-07T19:57:59.7587685Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:59.7588214Z 2025-05-07T19:57:59.7589462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:59.7591286Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:59.7592005Z ^ 2025-05-07T19:57:59.7592314Z 2025-05-07T19:58:18.0872352Z [92/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o 2025-05-07T19:58:18.0892436Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:18.8855028Z [93/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o 2025-05-07T19:58:18.8879696Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:18.8882370Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8884996Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8886114Z ^ 2025-05-07T19:58:18.8886376Z 2025-05-07T19:58:18.8886845Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8887480Z 2025-05-07T19:58:18.8889022Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8891642Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8892791Z ^ 2025-05-07T19:58:18.8893145Z 2025-05-07T19:58:18.8894600Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8897494Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8898841Z ^ 2025-05-07T19:58:18.8899090Z 2025-05-07T19:58:18.8899504Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8900121Z 2025-05-07T19:58:18.8901686Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8904333Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8905437Z ^ 2025-05-07T19:58:18.8905791Z 2025-05-07T19:58:18.8906945Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:18.8908628Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:18.8909231Z ^ 2025-05-07T19:58:18.8909444Z 2025-05-07T19:58:18.8910523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:18.8912021Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:18.8912597Z ^ 2025-05-07T19:58:18.8912823Z 2025-05-07T19:58:18.8914323Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8917002Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8918073Z ^ 2025-05-07T19:58:18.8918361Z 2025-05-07T19:58:18.8918800Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8919373Z 2025-05-07T19:58:18.8920921Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8923564Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8924656Z ^ 2025-05-07T19:58:18.8925007Z 2025-05-07T19:58:18.8926201Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:18.8927726Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:18.8928256Z ^ 2025-05-07T19:58:18.8928458Z 2025-05-07T19:58:18.8929417Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:18.8930906Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:18.8931411Z ^ 2025-05-07T19:58:18.8931641Z 2025-05-07T19:58:18.8933093Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8935542Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8936659Z ^ 2025-05-07T19:58:18.8936939Z 2025-05-07T19:58:18.8937607Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8938239Z 2025-05-07T19:58:18.8939996Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8942542Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8943675Z ^ 2025-05-07T19:58:18.8944034Z 2025-05-07T19:58:18.8945228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:18.8946727Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:18.8947268Z ^ 2025-05-07T19:58:18.8947483Z 2025-05-07T19:58:18.8948549Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:18.8950077Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:18.8950617Z ^ 2025-05-07T19:58:18.8950852Z 2025-05-07T19:58:18.8952368Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8954916Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8956005Z ^ 2025-05-07T19:58:18.8956288Z 2025-05-07T19:58:18.8956682Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8957163Z 2025-05-07T19:58:18.8958782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8961373Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8962466Z ^ 2025-05-07T19:58:18.8962932Z 2025-05-07T19:58:18.8963986Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:18.8965484Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:18.8966064Z ^ 2025-05-07T19:58:18.8966291Z 2025-05-07T19:58:18.8967606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:18.8969123Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:18.8969667Z ^ 2025-05-07T19:58:18.8969948Z 2025-05-07T19:58:18.8971587Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8974042Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8975143Z ^ 2025-05-07T19:58:18.8975424Z 2025-05-07T19:58:18.8975858Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8976491Z 2025-05-07T19:58:18.8978041Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8981002Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8982390Z ^ 2025-05-07T19:58:18.8982754Z 2025-05-07T19:58:18.8984286Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8986748Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8987868Z ^ 2025-05-07T19:58:18.8988126Z 2025-05-07T19:58:18.8988549Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:18.8989192Z 2025-05-07T19:58:18.8990812Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:18.8993397Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:18.8994543Z ^ 2025-05-07T19:58:18.8994946Z 2025-05-07T19:58:18.8995966Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:18.8998152Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:18.9000167Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:18.9002356Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:23.6758026Z [94/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o 2025-05-07T19:58:23.6784308Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:23.6787519Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:23.6789834Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:23.6790698Z ^ 2025-05-07T19:58:23.6791050Z 2025-05-07T19:58:23.6791516Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:23.6792184Z 2025-05-07T19:58:23.6793838Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:23.6796185Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:23.6797101Z ^ 2025-05-07T19:58:23.6797421Z 2025-05-07T19:58:30.6338880Z [95/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o 2025-05-07T19:58:30.6361210Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:30.6364618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:30.6366527Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:30.6367479Z ^ 2025-05-07T19:58:30.6367731Z 2025-05-07T19:58:30.6368207Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:30.6368823Z 2025-05-07T19:58:30.6370403Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:30.6372510Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:30.6373176Z ^ 2025-05-07T19:58:30.6373405Z 2025-05-07T19:58:31.5507755Z [96/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o 2025-05-07T19:58:31.5528624Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:33.0275146Z [97/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o 2025-05-07T19:58:33.0298217Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:33.0301028Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:33.0303492Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:33.0304256Z ^ 2025-05-07T19:58:33.0304566Z 2025-05-07T19:58:33.0305177Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:33.0305802Z 2025-05-07T19:58:33.0307182Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:33.0309154Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:33.0309948Z ^ 2025-05-07T19:58:33.0310238Z 2025-05-07T19:59:01.7112178Z [98/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o 2025-05-07T19:59:01.7134899Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:01.7137619Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:01.7139592Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:01.7140858Z ^ 2025-05-07T19:59:01.7141149Z 2025-05-07T19:59:01.7141608Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:01.7142214Z 2025-05-07T19:59:01.7143900Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:01.7145937Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:01.7146626Z ^ 2025-05-07T19:59:01.7146886Z 2025-05-07T19:59:02.0832760Z [99/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o 2025-05-07T19:59:02.0855330Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:02.0858089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.0860090Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:02.0860864Z ^ 2025-05-07T19:59:02.0861553Z 2025-05-07T19:59:02.0862082Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:02.0862704Z 2025-05-07T19:59:02.0864322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.0866419Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:02.0867438Z ^ 2025-05-07T19:59:02.0867728Z 2025-05-07T19:59:02.1693319Z [100/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o 2025-05-07T19:59:02.1715682Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:02.1718458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.1720496Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:02.1721249Z ^ 2025-05-07T19:59:02.1721598Z 2025-05-07T19:59:02.1722028Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:02.1723200Z 2025-05-07T19:59:02.1724835Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.1726816Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:02.1727608Z ^ 2025-05-07T19:59:02.1727892Z 2025-05-07T19:59:02.1830575Z [101/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o 2025-05-07T19:59:02.1854830Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:02.1857603Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.1859429Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:02.1860123Z ^ 2025-05-07T19:59:02.1860407Z 2025-05-07T19:59:02.1860848Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:02.1861832Z 2025-05-07T19:59:02.1863282Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.1865124Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:02.1865824Z ^ 2025-05-07T19:59:02.1866161Z 2025-05-07T19:59:03.0624476Z [102/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o 2025-05-07T19:59:03.0647979Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:03.0650651Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.0652618Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.0653374Z ^ 2025-05-07T19:59:03.0654116Z 2025-05-07T19:59:03.0654519Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.0655167Z 2025-05-07T19:59:03.0656766Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.0658712Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:03.0659437Z ^ 2025-05-07T19:59:03.0659721Z 2025-05-07T19:59:03.0675467Z [103/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:59:03.0692935Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:59:06.3333094Z [104/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o 2025-05-07T19:59:06.3355208Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:06.3357890Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:06.3359881Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:06.3360647Z ^ 2025-05-07T19:59:06.3360980Z 2025-05-07T19:59:06.3361402Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:06.3362021Z 2025-05-07T19:59:06.3363601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:06.3365655Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:06.3366456Z ^ 2025-05-07T19:59:06.3366755Z 2025-05-07T19:59:11.6686284Z [105/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o 2025-05-07T19:59:11.6710452Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:11.6713430Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:11.6715635Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:11.6716496Z ^ 2025-05-07T19:59:11.6716807Z 2025-05-07T19:59:11.6717260Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:11.6718002Z 2025-05-07T19:59:11.6719613Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:11.6721822Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:11.6722798Z ^ 2025-05-07T19:59:11.6723125Z 2025-05-07T19:59:12.0295990Z [106/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:59:12.0315239Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:59:12.6406473Z [107/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o 2025-05-07T19:59:12.6419320Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:12.6420941Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:12.6422106Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:12.6422920Z ^ 2025-05-07T19:59:12.6423115Z 2025-05-07T19:59:12.6423471Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:12.6423831Z 2025-05-07T19:59:12.6424694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:12.6425861Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:12.6426353Z ^ 2025-05-07T19:59:12.6426533Z 2025-05-07T19:59:12.6433089Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:12.6446590Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_10multipliesES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_IS1S_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:12.6460110Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:12.6473794Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_INS_10multipliesEffLS1T_2EvEEJS1X_NS1Q_IS1Z_JNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:12.6487791Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:12.6502424Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_S1O_LNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_S1O_S1O_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1Q_INS1R_INS_10multipliesES1O_fLS1T_2EvEEJNS1V_ILi0ESI_ffS1W_Li4ELb1EEENS1Q_INS1R_IS1Y_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:13.9717738Z [108/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o 2025-05-07T19:59:13.9731031Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:13.9732624Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:13.9733808Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:13.9734296Z ^ 2025-05-07T19:59:13.9734474Z 2025-05-07T19:59:13.9734728Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:13.9735120Z 2025-05-07T19:59:13.9735941Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:13.9737137Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:13.9737593Z ^ 2025-05-07T19:59:13.9737769Z 2025-05-07T19:59:13.9744243Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_10multipliesES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_IS1Q_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:13.9757520Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:13.9771267Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_INS_10multipliesEffLS1R_2EvEEJS1V_NS1O_IS1X_JNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:13.9784985Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:13.9798786Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_S1M_LNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_S1M_S1M_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1O_INS1P_INS_10multipliesES1M_fLS1R_2EvEEJNS1T_ILi0ESI_ffS1U_Li4ELb1EEENS1O_INS1P_IS1W_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:13.9813374Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:20.5473800Z [109/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o 2025-05-07T19:59:20.5494903Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:43.4598696Z [110/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o 2025-05-07T19:59:43.4620589Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:06.2874852Z [111/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o 2025-05-07T20:00:06.3234699Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:06.3238178Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.3240496Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.3456291Z ^ 2025-05-07T20:00:06.3457071Z 2025-05-07T20:00:06.3457567Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.3458201Z 2025-05-07T20:00:06.3459628Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.3461723Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:06.3462353Z ^ 2025-05-07T20:00:06.3462704Z 2025-05-07T20:00:07.1633935Z [112/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o 2025-05-07T20:00:07.1655777Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:07.1658583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:07.1661200Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:07.1662021Z ^ 2025-05-07T20:00:07.1662344Z 2025-05-07T20:00:07.1663038Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:07.1663660Z 2025-05-07T20:00:07.1665114Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:07.1667364Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:07.1668140Z ^ 2025-05-07T20:00:07.1668436Z 2025-05-07T20:00:10.8004865Z [113/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o 2025-05-07T20:00:10.8017105Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:10.9203637Z [114/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o 2025-05-07T20:00:10.9216632Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:10.9218243Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:10.9219396Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:10.9219886Z ^ 2025-05-07T20:00:10.9220066Z 2025-05-07T20:00:10.9220352Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:10.9220721Z 2025-05-07T20:00:10.9221552Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:10.9222734Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:10.9223184Z ^ 2025-05-07T20:00:10.9223389Z 2025-05-07T20:00:14.1529584Z [115/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o 2025-05-07T20:00:14.1542194Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:14.1543843Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1545034Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1545498Z ^ 2025-05-07T20:00:14.1545711Z 2025-05-07T20:00:14.1545965Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.1546329Z 2025-05-07T20:00:14.1547178Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1548356Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:14.1548853Z ^ 2025-05-07T20:00:14.1549031Z 2025-05-07T20:00:26.0989067Z [116/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o 2025-05-07T20:00:26.1002000Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:26.1003713Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1004894Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.1005356Z ^ 2025-05-07T20:00:26.1005568Z 2025-05-07T20:00:26.1005821Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.1006195Z 2025-05-07T20:00:26.1007074Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1008253Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:26.1008740Z ^ 2025-05-07T20:00:26.1008921Z 2025-05-07T20:00:30.1657348Z [117/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o 2025-05-07T20:00:30.1670061Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:30.8360348Z [118/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o 2025-05-07T20:00:30.8372554Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:31.5138886Z [119/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_example_py.so -o experimental/example/fbgemm_gpu_experimental_example_py.so experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -ldl && : 2025-05-07T20:00:31.5339197Z [120/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/experimental/example && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:31.5340779Z ################################################################################ 2025-05-07T20:00:31.5341160Z [CMAKE] Running post-build script ... 2025-05-07T20:00:31.5341942Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:31.5343054Z Removing all RPATHs ... 2025-05-07T20:00:31.5343360Z ################################################################################ 2025-05-07T20:00:37.2518118Z [121/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o 2025-05-07T20:00:37.2530485Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:37.2532157Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:37.2533464Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:37.2533938Z ^ 2025-05-07T20:00:37.2534149Z 2025-05-07T20:00:37.2534408Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:37.2534771Z 2025-05-07T20:00:37.2535622Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:37.2536800Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:37.2537300Z ^ 2025-05-07T20:00:37.2537481Z 2025-05-07T20:00:42.8331032Z [122/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o 2025-05-07T20:00:42.8343615Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:42.8345361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8346711Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8347253Z ^ 2025-05-07T20:00:42.8347487Z 2025-05-07T20:00:42.8347742Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.8348110Z 2025-05-07T20:00:42.8349069Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8350437Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8350938Z ^ 2025-05-07T20:00:42.8351215Z 2025-05-07T20:00:42.8352154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8353744Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8354244Z ^ 2025-05-07T20:00:42.8354475Z 2025-05-07T20:00:42.8354763Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.8355125Z 2025-05-07T20:00:42.8356062Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8426379Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8489511Z ^ 2025-05-07T20:00:42.8490177Z 2025-05-07T20:00:42.8491242Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8521899Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8588808Z ^ 2025-05-07T20:00:42.8589444Z 2025-05-07T20:00:42.8589751Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.8590159Z 2025-05-07T20:00:42.8591117Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8592550Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8593089Z ^ 2025-05-07T20:00:42.8593345Z 2025-05-07T20:00:42.8594318Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8646697Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8647488Z ^ 2025-05-07T20:00:42.8647729Z 2025-05-07T20:00:42.8648020Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.8648395Z 2025-05-07T20:00:42.8649346Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8650765Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8651303Z ^ 2025-05-07T20:00:42.8651552Z 2025-05-07T20:00:42.8652485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8823721Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8824724Z ^ 2025-05-07T20:00:42.8825007Z 2025-05-07T20:00:42.8825773Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.8826151Z 2025-05-07T20:00:42.8827281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8828608Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8902528Z ^ 2025-05-07T20:00:42.8902966Z 2025-05-07T20:00:42.8903984Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8905336Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8965965Z ^ 2025-05-07T20:00:42.8966633Z 2025-05-07T20:00:42.8989729Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.8990242Z 2025-05-07T20:00:42.8991212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.8992590Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.8993101Z ^ 2025-05-07T20:00:42.8993384Z 2025-05-07T20:00:42.8994336Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.9052779Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.9083968Z ^ 2025-05-07T20:00:42.9084535Z 2025-05-07T20:00:42.9084898Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:42.9085309Z 2025-05-07T20:00:42.9146491Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:42.9173180Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:42.9173787Z ^ 2025-05-07T20:00:42.9174083Z 2025-05-07T20:01:08.8715804Z [123/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o 2025-05-07T20:01:08.8738464Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:08.8741208Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:08.8743253Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:08.8744062Z ^ 2025-05-07T20:01:08.8744445Z 2025-05-07T20:01:08.8744898Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.8745587Z 2025-05-07T20:01:08.8747120Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:08.8749300Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:08.8927751Z ^ 2025-05-07T20:01:08.8928177Z 2025-05-07T20:01:08.8929773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8932310Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8933177Z ^ 2025-05-07T20:01:08.8933542Z 2025-05-07T20:01:08.8935120Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8937583Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8938430Z ^ 2025-05-07T20:01:08.8938840Z 2025-05-07T20:01:08.8940546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8943297Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8944334Z ^ 2025-05-07T20:01:08.8944769Z 2025-05-07T20:01:08.8945175Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.8945771Z 2025-05-07T20:01:08.8947531Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8949919Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8950858Z ^ 2025-05-07T20:01:08.8951327Z 2025-05-07T20:01:08.8953044Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8955295Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8956179Z ^ 2025-05-07T20:01:08.8956531Z 2025-05-07T20:01:08.8956985Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.8957649Z 2025-05-07T20:01:08.8959413Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8961658Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8962637Z ^ 2025-05-07T20:01:08.8963103Z 2025-05-07T20:01:08.8964774Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8967371Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8968214Z ^ 2025-05-07T20:01:08.8968595Z 2025-05-07T20:01:08.8969011Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.8969632Z 2025-05-07T20:01:08.8971323Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8973669Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8974643Z ^ 2025-05-07T20:01:08.8975039Z 2025-05-07T20:01:08.8976853Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8979212Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8980047Z ^ 2025-05-07T20:01:08.8980417Z 2025-05-07T20:01:08.8980844Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.8981505Z 2025-05-07T20:01:08.8983653Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8986096Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8986916Z ^ 2025-05-07T20:01:08.8987372Z 2025-05-07T20:01:08.8989071Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8991433Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8992293Z ^ 2025-05-07T20:01:08.8992733Z 2025-05-07T20:01:08.8993147Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.8993733Z 2025-05-07T20:01:08.8995488Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.8997905Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.8998842Z ^ 2025-05-07T20:01:08.8999276Z 2025-05-07T20:01:08.9000909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.9003411Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.9004319Z ^ 2025-05-07T20:01:08.9004735Z 2025-05-07T20:01:08.9005158Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:08.9005825Z 2025-05-07T20:01:08.9007601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:01:08.9010014Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:01:08.9010909Z ^ 2025-05-07T20:01:08.9011379Z 2025-05-07T20:01:29.6710224Z [124/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o 2025-05-07T20:01:29.6733099Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:29.6735804Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:29.6737906Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:29.6738766Z ^ 2025-05-07T20:01:29.6739063Z 2025-05-07T20:01:29.6739487Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:29.6740164Z 2025-05-07T20:01:29.6741629Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:29.6743739Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:29.6744507Z ^ 2025-05-07T20:01:29.6744832Z 2025-05-07T20:01:35.3586485Z [125/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o 2025-05-07T20:01:35.3609078Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:35.3611971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.3614054Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.3614877Z ^ 2025-05-07T20:01:35.3615180Z 2025-05-07T20:01:35.3615687Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.3616327Z 2025-05-07T20:01:35.3617738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.3619794Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:35.3620540Z ^ 2025-05-07T20:01:35.3620830Z 2025-05-07T20:01:35.3622160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.3624151Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.3624911Z ^ 2025-05-07T20:01:35.3625380Z detected during: 2025-05-07T20:01:35.3650951Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.3700172Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.3750442Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.3779116Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.3781226Z 2025-05-07T20:01:35.3781676Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.3782374Z 2025-05-07T20:01:35.3783816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.3785819Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.3786514Z ^ 2025-05-07T20:01:35.3786938Z detected during: 2025-05-07T20:01:35.3811865Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:35.3862247Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.3916943Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.3966971Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.3995651Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.3997779Z 2025-05-07T20:01:35.3999156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.4001189Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.4001938Z ^ 2025-05-07T20:01:35.4002413Z detected during: 2025-05-07T20:01:35.4029067Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.4074773Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.4119833Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.4149034Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.4151198Z 2025-05-07T20:01:35.4151625Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.4152248Z 2025-05-07T20:01:35.4153581Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.4155495Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.4156229Z ^ 2025-05-07T20:01:35.4156611Z detected during: 2025-05-07T20:01:35.4181415Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:35.4230757Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.4281392Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.4332738Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.4361962Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.4364442Z 2025-05-07T20:01:35.4366092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.4368306Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.4369133Z ^ 2025-05-07T20:01:35.4369603Z detected during: 2025-05-07T20:01:35.4396037Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.4446636Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.4497189Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.4526122Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.4528076Z 2025-05-07T20:01:35.4528524Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.4529125Z 2025-05-07T20:01:35.4530499Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.4532448Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.4533124Z ^ 2025-05-07T20:01:35.4533517Z detected during: 2025-05-07T20:01:35.4558335Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:35.4609666Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.4658919Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.4708674Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.4737750Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.4739810Z 2025-05-07T20:01:35.4742007Z ptxas /tmp/tmpxft_00008c9c_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:35.4746583Z ptxas /tmp/tmpxft_00008c9c_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:35.4751033Z ptxas /tmp/tmpxft_00008c9c_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:35.4755509Z ptxas /tmp/tmpxft_00008c9c_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:35.4759166Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.4761154Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.4761929Z ^ 2025-05-07T20:01:35.4762365Z detected during: 2025-05-07T20:01:35.4789124Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.4837830Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.4888263Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.4916930Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.4919036Z 2025-05-07T20:01:35.4919456Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.4920117Z 2025-05-07T20:01:35.4921510Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.4923618Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.4924362Z ^ 2025-05-07T20:01:35.4924756Z detected during: 2025-05-07T20:01:35.4949813Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:35.4999241Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.5049893Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.5087171Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.5103520Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.5104711Z 2025-05-07T20:01:35.5105507Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.5106684Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.5107140Z ^ 2025-05-07T20:01:35.5107441Z detected during: 2025-05-07T20:01:35.5122204Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.5150394Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.5179059Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.5195101Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.5196253Z 2025-05-07T20:01:35.5196500Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.5196859Z 2025-05-07T20:01:35.5197678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.5198783Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.5199215Z ^ 2025-05-07T20:01:35.5199443Z detected during: 2025-05-07T20:01:35.5213309Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:35.5241522Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.5269740Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.5298227Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.5314371Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.5315508Z 2025-05-07T20:01:35.5316322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.5317653Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.5318129Z ^ 2025-05-07T20:01:35.5318395Z detected during: 2025-05-07T20:01:35.5333146Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.5361041Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.5389658Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.5405870Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.5407012Z 2025-05-07T20:01:35.5407262Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:35.5407645Z 2025-05-07T20:01:35.5408438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:35.5409553Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:35.5409940Z ^ 2025-05-07T20:01:35.5410163Z detected during: 2025-05-07T20:01:35.5423886Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:35.5452155Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:35.5480082Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:35.5508579Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:35.5524698Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:35.5525847Z 2025-05-07T20:01:38.9959123Z [126/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o 2025-05-07T20:01:38.9981772Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:38.9984458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.9986529Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.9987310Z ^ 2025-05-07T20:01:38.9987982Z 2025-05-07T20:01:38.9988396Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.9989033Z 2025-05-07T20:01:38.9990656Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.9992685Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:38.9993443Z ^ 2025-05-07T20:01:38.9993726Z 2025-05-07T20:01:38.9995152Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.9997095Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.9997859Z ^ 2025-05-07T20:01:38.9998307Z detected during: 2025-05-07T20:01:39.0024733Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.0074845Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.0124739Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.0146816Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.0148423Z 2025-05-07T20:01:39.0148759Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.0149238Z 2025-05-07T20:01:39.0150351Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.0151892Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.0152450Z ^ 2025-05-07T20:01:39.0152750Z detected during: 2025-05-07T20:01:39.0172838Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.0213543Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.0255597Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.0299020Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.0323014Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.0324794Z 2025-05-07T20:01:39.0326017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.0327744Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.0328381Z ^ 2025-05-07T20:01:39.0328778Z detected during: 2025-05-07T20:01:39.0350872Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.0393192Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.0435579Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.0459452Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.0461184Z 2025-05-07T20:01:39.0461544Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.0462058Z 2025-05-07T20:01:39.0463230Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.0464893Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.0465495Z ^ 2025-05-07T20:01:39.0465810Z detected during: 2025-05-07T20:01:39.0487065Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.0529351Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.0571709Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.0614302Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.0638408Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.0640115Z 2025-05-07T20:01:39.0641335Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.0643086Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.0643783Z ^ 2025-05-07T20:01:39.0644145Z detected during: 2025-05-07T20:01:39.0666098Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.0708158Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.0750645Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.0775288Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.0777109Z 2025-05-07T20:01:39.0777519Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.0778335Z 2025-05-07T20:01:39.0779581Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.0781459Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.0782070Z ^ 2025-05-07T20:01:39.0782414Z detected during: 2025-05-07T20:01:39.0804494Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.0849070Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.0893293Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.0937477Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.0962803Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.0964638Z 2025-05-07T20:01:39.0965899Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.0967870Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.0968582Z ^ 2025-05-07T20:01:39.0968965Z detected during: 2025-05-07T20:01:39.0991669Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.1035804Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.1079575Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.1104651Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.1106463Z 2025-05-07T20:01:39.1106839Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.1107410Z 2025-05-07T20:01:39.1108640Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.1110334Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.1110968Z ^ 2025-05-07T20:01:39.1111341Z detected during: 2025-05-07T20:01:39.1132945Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.1177529Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.1222030Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.1268895Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.1293628Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.1295434Z 2025-05-07T20:01:39.1296681Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.1298477Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.1299164Z ^ 2025-05-07T20:01:39.1299583Z detected during: 2025-05-07T20:01:39.1322152Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.1365382Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.1409398Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.1434293Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.1436157Z 2025-05-07T20:01:39.1436542Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.1437129Z 2025-05-07T20:01:39.1438420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.1440136Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.1440772Z ^ 2025-05-07T20:01:39.1441114Z detected during: 2025-05-07T20:01:39.1463238Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.1508005Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.1551631Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.1586330Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.1602626Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.1603783Z 2025-05-07T20:01:39.1604616Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.1605765Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.1606261Z ^ 2025-05-07T20:01:39.1606545Z detected during: 2025-05-07T20:01:39.1621200Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.1649046Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.1678354Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.1694573Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.1695722Z 2025-05-07T20:01:39.1696010Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.1696381Z 2025-05-07T20:01:39.1697192Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.1698344Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.1698770Z ^ 2025-05-07T20:01:39.1699056Z detected during: 2025-05-07T20:01:39.1712860Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.1741287Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.1769258Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.1797618Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.1813739Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:39.1814889Z 2025-05-07T20:01:41.1030616Z [127/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o 2025-05-07T20:01:41.1043635Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:41.1045223Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1046362Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1046835Z ^ 2025-05-07T20:01:41.1047010Z 2025-05-07T20:01:41.1047284Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.1047646Z 2025-05-07T20:01:41.1048457Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1049630Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:41.1050079Z ^ 2025-05-07T20:01:41.1050272Z 2025-05-07T20:01:41.1051068Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1052228Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1052767Z ^ 2025-05-07T20:01:41.1053059Z detected during: 2025-05-07T20:01:41.1068084Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1095895Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1124079Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1140031Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1141183Z 2025-05-07T20:01:41.1141431Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.1141819Z 2025-05-07T20:01:41.1142626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1143755Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1144162Z ^ 2025-05-07T20:01:41.1144416Z detected during: 2025-05-07T20:01:41.1158174Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.1186625Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1214323Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1242351Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1258478Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1259731Z 2025-05-07T20:01:41.1260583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1261743Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1262189Z ^ 2025-05-07T20:01:41.1262480Z detected during: 2025-05-07T20:01:41.1277243Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1305201Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1333159Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1349168Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1350332Z 2025-05-07T20:01:41.1350584Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.1350944Z 2025-05-07T20:01:41.1351769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1352873Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1353302Z ^ 2025-05-07T20:01:41.1353535Z detected during: 2025-05-07T20:01:41.1367716Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.1395991Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1423670Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1452098Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1468100Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1469230Z 2025-05-07T20:01:41.1470048Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1471168Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1471613Z ^ 2025-05-07T20:01:41.1471862Z detected during: 2025-05-07T20:01:41.1486511Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1514116Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1542171Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1558111Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1559222Z 2025-05-07T20:01:41.1559479Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.1559833Z 2025-05-07T20:01:41.1560623Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1561735Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1562122Z ^ 2025-05-07T20:01:41.1562354Z detected during: 2025-05-07T20:01:41.1576439Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.1604666Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1632187Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1660382Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1676440Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1677575Z 2025-05-07T20:01:41.1678371Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1679501Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1679947Z ^ 2025-05-07T20:01:41.1680197Z detected during: 2025-05-07T20:01:41.1694848Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1722456Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1750572Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1766675Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1767955Z 2025-05-07T20:01:41.1768207Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.1768591Z 2025-05-07T20:01:41.1769392Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1770510Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1770919Z ^ 2025-05-07T20:01:41.1771170Z detected during: 2025-05-07T20:01:41.1785206Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.1813601Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1841409Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1869767Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1885812Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1886964Z 2025-05-07T20:01:41.1887763Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1888922Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1889368Z ^ 2025-05-07T20:01:41.1889658Z detected during: 2025-05-07T20:01:41.1904227Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.1931923Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.1960077Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.1976248Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.1977402Z 2025-05-07T20:01:41.1977651Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.1978012Z 2025-05-07T20:01:41.1978829Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.1980137Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.1980576Z ^ 2025-05-07T20:01:41.1980804Z detected during: 2025-05-07T20:01:41.1994635Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.2023055Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.2050686Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.2078949Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.2095039Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.2096175Z 2025-05-07T20:01:41.2096994Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.2098140Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.2098635Z ^ 2025-05-07T20:01:41.2098930Z detected during: 2025-05-07T20:01:41.2113426Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.2141226Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.2169675Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.2185639Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.2186925Z 2025-05-07T20:01:41.2187174Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.2187533Z 2025-05-07T20:01:41.2188365Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.2189472Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.2189912Z ^ 2025-05-07T20:01:41.2190149Z detected during: 2025-05-07T20:01:41.2203931Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.2232285Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.2260073Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.2288529Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.2304657Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:41.2305793Z 2025-05-07T20:01:50.9792669Z [128/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o 2025-05-07T20:01:50.9805592Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:50.9807184Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.9808331Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.9808806Z ^ 2025-05-07T20:01:50.9808985Z 2025-05-07T20:01:50.9809235Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.9809615Z 2025-05-07T20:01:50.9810427Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.9811606Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:50.9812045Z ^ 2025-05-07T20:01:50.9812238Z 2025-05-07T20:01:50.9813027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.9814180Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.9814619Z ^ 2025-05-07T20:01:50.9814902Z detected during: 2025-05-07T20:01:50.9829559Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.9857410Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.9885940Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.9902065Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.9903190Z 2025-05-07T20:01:50.9903461Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.9903823Z 2025-05-07T20:01:50.9904621Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.9918562Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.9919189Z ^ 2025-05-07T20:01:50.9919446Z detected during: 2025-05-07T20:01:50.9933613Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.9961777Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.9989675Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0017957Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0033950Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0035108Z 2025-05-07T20:01:51.0035909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0037085Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0037537Z ^ 2025-05-07T20:01:51.0037828Z detected during: 2025-05-07T20:01:51.0052559Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0080671Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0109068Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0126279Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0127451Z 2025-05-07T20:01:51.0127700Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.0128068Z 2025-05-07T20:01:51.0128870Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0130007Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0150287Z ^ 2025-05-07T20:01:51.0150552Z detected during: 2025-05-07T20:01:51.0164789Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.0193385Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0221208Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0249395Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0265365Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0266535Z 2025-05-07T20:01:51.0267520Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0268694Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0269144Z ^ 2025-05-07T20:01:51.0269442Z detected during: 2025-05-07T20:01:51.0284289Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0312092Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0340054Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0356046Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0357233Z 2025-05-07T20:01:51.0357480Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.0357861Z 2025-05-07T20:01:51.0358666Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0359772Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0360207Z ^ 2025-05-07T20:01:51.0360473Z detected during: 2025-05-07T20:01:51.0374654Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.0402929Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0430689Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0458753Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0474966Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0476115Z 2025-05-07T20:01:51.0476912Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0479528Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0480004Z ^ 2025-05-07T20:01:51.0480276Z detected during: 2025-05-07T20:01:51.0494966Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0522727Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0550827Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0567205Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0568340Z 2025-05-07T20:01:51.0568592Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.0568986Z 2025-05-07T20:01:51.0569795Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0570936Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0571355Z ^ 2025-05-07T20:01:51.0571624Z detected during: 2025-05-07T20:01:51.0585609Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.0613924Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0641578Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0670113Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0686272Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0687433Z 2025-05-07T20:01:51.0688231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0689390Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0689840Z ^ 2025-05-07T20:01:51.0690137Z detected during: 2025-05-07T20:01:51.0704603Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0732380Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0760557Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0776694Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0777852Z 2025-05-07T20:01:51.0778104Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.0778467Z 2025-05-07T20:01:51.0779292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0780400Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0780832Z ^ 2025-05-07T20:01:51.0781062Z detected during: 2025-05-07T20:01:51.0795023Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.0824606Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0852306Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0880643Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0896761Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0897889Z 2025-05-07T20:01:51.0898712Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0899855Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0900329Z ^ 2025-05-07T20:01:51.0900619Z detected during: 2025-05-07T20:01:51.0915050Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.0942760Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.0971466Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.0987533Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.0988663Z 2025-05-07T20:01:51.0988939Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.0989297Z 2025-05-07T20:01:51.0990096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.0991222Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.0991650Z ^ 2025-05-07T20:01:51.0991881Z detected during: 2025-05-07T20:01:51.1005924Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.1034274Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.1062004Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.1090384Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.1106409Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:51.1107544Z 2025-05-07T20:01:53.4277650Z [129/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o 2025-05-07T20:01:53.4290659Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:53.4292214Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4293356Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4293789Z ^ 2025-05-07T20:01:53.4293971Z 2025-05-07T20:01:53.4294213Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.4294564Z 2025-05-07T20:01:53.4295384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4296525Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:53.4296961Z ^ 2025-05-07T20:01:53.4297124Z 2025-05-07T20:01:53.4297922Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4299075Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4299517Z ^ 2025-05-07T20:01:53.4299767Z detected during: 2025-05-07T20:01:53.4314498Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4342412Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.4370981Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.4387065Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.4388191Z 2025-05-07T20:01:53.4388427Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.4388790Z 2025-05-07T20:01:53.4389576Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4390715Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4391103Z ^ 2025-05-07T20:01:53.4391382Z detected during: 2025-05-07T20:01:53.4405149Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.4433473Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4463015Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.4491866Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.4507924Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.4509086Z 2025-05-07T20:01:53.4509885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4511038Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4511481Z ^ 2025-05-07T20:01:53.4511774Z detected during: 2025-05-07T20:01:53.4526521Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4554540Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.4583195Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.4599393Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.4600560Z 2025-05-07T20:01:53.4600809Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.4601174Z 2025-05-07T20:01:53.4601990Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4603148Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4603575Z ^ 2025-05-07T20:01:53.4603806Z detected during: 2025-05-07T20:01:53.4617658Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.4646004Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4674188Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.4702704Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.4718784Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.4719940Z 2025-05-07T20:01:53.4720737Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4721877Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4722345Z ^ 2025-05-07T20:01:53.4722680Z detected during: 2025-05-07T20:01:53.4737315Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4765305Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.4795291Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.4811622Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.4812892Z 2025-05-07T20:01:53.4813167Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.4813530Z 2025-05-07T20:01:53.4814335Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4815525Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4815941Z ^ 2025-05-07T20:01:53.4816212Z detected during: 2025-05-07T20:01:53.4830093Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.4858688Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4887077Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.4915609Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.4931995Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.4933116Z 2025-05-07T20:01:53.4934349Z ptxas /tmp/tmpxft_00008c9d_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:53.4937019Z ptxas /tmp/tmpxft_00008c9d_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:53.4939567Z ptxas /tmp/tmpxft_00008c9d_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:53.4942101Z ptxas /tmp/tmpxft_00008c9d_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:53.4944225Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.4945358Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.4945832Z ^ 2025-05-07T20:01:53.4946130Z detected during: 2025-05-07T20:01:53.4961021Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.4989122Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.5017471Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.5033626Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.5034758Z 2025-05-07T20:01:53.5035006Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.5035389Z 2025-05-07T20:01:53.5036186Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.5037314Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.5037756Z ^ 2025-05-07T20:01:53.5038005Z detected during: 2025-05-07T20:01:53.5051781Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.5080261Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.5109285Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.5137749Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.5153865Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.5155054Z 2025-05-07T20:01:53.5155849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.5157004Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.5157446Z ^ 2025-05-07T20:01:53.5157729Z detected during: 2025-05-07T20:01:53.5172639Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.5200485Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.5228704Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.5244914Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.5246092Z 2025-05-07T20:01:53.5246343Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.5246707Z 2025-05-07T20:01:53.5247536Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.5248652Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.5249078Z ^ 2025-05-07T20:01:53.5249315Z detected during: 2025-05-07T20:01:53.5263106Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.5291591Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.5319502Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.5347946Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.5364097Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.5365249Z 2025-05-07T20:01:53.5366054Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.5367348Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.5367826Z ^ 2025-05-07T20:01:53.5368128Z detected during: 2025-05-07T20:01:53.5382816Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.5410586Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.5440121Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.5456387Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.5457527Z 2025-05-07T20:01:53.5457798Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.5458159Z 2025-05-07T20:01:53.5458959Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.5460094Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.5460526Z ^ 2025-05-07T20:01:53.5460794Z detected during: 2025-05-07T20:01:53.5474828Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.5503214Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.5531089Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.5559440Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.5575797Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:53.5576931Z 2025-05-07T20:01:54.4710434Z [130/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o 2025-05-07T20:01:54.4723763Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:54.4725329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.4726503Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.4726973Z ^ 2025-05-07T20:01:54.4727150Z 2025-05-07T20:01:54.4727401Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.4727766Z 2025-05-07T20:01:54.4728610Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.4729766Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:54.4730233Z ^ 2025-05-07T20:01:54.4730405Z 2025-05-07T20:01:55.2186912Z [131/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o 2025-05-07T20:01:55.2199545Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:55.2201132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2202303Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2202847Z ^ 2025-05-07T20:01:55.2203054Z 2025-05-07T20:01:55.2203306Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.2203668Z 2025-05-07T20:01:55.2204503Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2205665Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:55.2206131Z ^ 2025-05-07T20:01:55.2206306Z 2025-05-07T20:01:55.2207096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2208244Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2208709Z ^ 2025-05-07T20:01:55.2208976Z detected during: 2025-05-07T20:01:55.2223917Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2252276Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2281096Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2297484Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2298650Z 2025-05-07T20:01:55.2298919Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.2299278Z 2025-05-07T20:01:55.2300076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2301201Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2301635Z ^ 2025-05-07T20:01:55.2301866Z detected during: 2025-05-07T20:01:55.2315761Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.2346183Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2374752Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2403498Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2419840Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2421005Z 2025-05-07T20:01:55.2421802Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2422968Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2423444Z ^ 2025-05-07T20:01:55.2423721Z detected during: 2025-05-07T20:01:55.2438549Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2466670Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2495546Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2511830Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2512965Z 2025-05-07T20:01:55.2513213Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.2513590Z 2025-05-07T20:01:55.2514389Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2515517Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2515927Z ^ 2025-05-07T20:01:55.2516177Z detected during: 2025-05-07T20:01:55.2530087Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.2558521Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2586857Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2615589Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2631907Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2633065Z 2025-05-07T20:01:55.2633855Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2635041Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2635487Z ^ 2025-05-07T20:01:55.2635771Z detected during: 2025-05-07T20:01:55.2650613Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2679930Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2708482Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2724716Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2725862Z 2025-05-07T20:01:55.2726112Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.2726471Z 2025-05-07T20:01:55.2727292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2728510Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2728944Z ^ 2025-05-07T20:01:55.2729181Z detected during: 2025-05-07T20:01:55.2743129Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.2771810Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2800951Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2829704Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2846020Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2847157Z 2025-05-07T20:01:55.2847962Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2849079Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2849516Z ^ 2025-05-07T20:01:55.2849776Z detected during: 2025-05-07T20:01:55.2864630Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.2892983Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.2921522Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.2937775Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.2938897Z 2025-05-07T20:01:55.2939147Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.2939498Z 2025-05-07T20:01:55.2940281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.2941389Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.2941793Z ^ 2025-05-07T20:01:55.2942009Z detected during: 2025-05-07T20:01:55.2955941Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.2985538Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.3013633Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.3042219Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.3058415Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.3059529Z 2025-05-07T20:01:55.3060312Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.3061437Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.3061876Z ^ 2025-05-07T20:01:55.3062125Z detected during: 2025-05-07T20:01:55.3077178Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.3105343Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.3133967Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.3150027Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.3151189Z 2025-05-07T20:01:55.3151439Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.3151816Z 2025-05-07T20:01:55.3152610Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.3153737Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.3154145Z ^ 2025-05-07T20:01:55.3154391Z detected during: 2025-05-07T20:01:55.3168606Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.3197136Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.3225406Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.3254049Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.3270521Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.3271674Z 2025-05-07T20:01:55.3272467Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.3273626Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.3274080Z ^ 2025-05-07T20:01:55.3274381Z detected during: 2025-05-07T20:01:55.3289108Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.3318561Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.3347328Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.3363611Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.3364765Z 2025-05-07T20:01:55.3365012Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.3365376Z 2025-05-07T20:01:55.3366193Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.3367408Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.3367846Z ^ 2025-05-07T20:01:55.3368077Z detected during: 2025-05-07T20:01:55.3382015Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.3410430Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.3438634Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.3467440Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.3483743Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:55.3484901Z 2025-05-07T20:01:55.3496203Z [132/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o 2025-05-07T20:01:55.3508375Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:55.3509943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.3511089Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.3511557Z ^ 2025-05-07T20:01:55.3511727Z 2025-05-07T20:01:55.3511992Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.3512346Z 2025-05-07T20:01:55.3513161Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.3514340Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:55.3514776Z ^ 2025-05-07T20:01:55.3514967Z 2025-05-07T20:01:56.9020151Z [133/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o 2025-05-07T20:01:56.9032792Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:56.9034597Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9035772Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9036223Z ^ 2025-05-07T20:01:56.9036428Z 2025-05-07T20:01:56.9036675Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.9037033Z 2025-05-07T20:01:56.9037865Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9038993Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:56.9039443Z ^ 2025-05-07T20:01:56.9039614Z 2025-05-07T20:01:56.9040403Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9041548Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9042010Z ^ 2025-05-07T20:01:56.9042262Z detected during: 2025-05-07T20:01:56.9057074Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9085070Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9113266Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9129288Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9130425Z 2025-05-07T20:01:56.9130670Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.9131046Z 2025-05-07T20:01:56.9131841Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9132960Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9133368Z ^ 2025-05-07T20:01:56.9133618Z detected during: 2025-05-07T20:01:56.9147539Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.9176075Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9203823Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9232077Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9248010Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9249168Z 2025-05-07T20:01:56.9249956Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9251161Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9251609Z ^ 2025-05-07T20:01:56.9251905Z detected during: 2025-05-07T20:01:56.9266486Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9307517Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9335835Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9351834Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9352997Z 2025-05-07T20:01:56.9353250Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.9353616Z 2025-05-07T20:01:56.9354436Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9355585Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9356073Z ^ 2025-05-07T20:01:56.9356319Z detected during: 2025-05-07T20:01:56.9370457Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.9398751Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9426364Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9454434Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9470506Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9471640Z 2025-05-07T20:01:56.9472469Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9473610Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9474086Z ^ 2025-05-07T20:01:56.9474361Z detected during: 2025-05-07T20:01:56.9489031Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9516584Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9544786Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9560741Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9561913Z 2025-05-07T20:01:56.9562238Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.9562653Z 2025-05-07T20:01:56.9563449Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9564590Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9564999Z ^ 2025-05-07T20:01:56.9565261Z detected during: 2025-05-07T20:01:56.9579270Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.9607681Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9635309Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9663485Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9679647Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9680789Z 2025-05-07T20:01:56.9681587Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9682836Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9683323Z ^ 2025-05-07T20:01:56.9683622Z detected during: 2025-05-07T20:01:56.9698258Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9725959Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9754205Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9770402Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9771544Z 2025-05-07T20:01:56.9771794Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.9772178Z 2025-05-07T20:01:56.9772983Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9774125Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9774543Z ^ 2025-05-07T20:01:56.9774811Z detected during: 2025-05-07T20:01:56.9788464Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.9816596Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9844296Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9872462Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9888503Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9889643Z 2025-05-07T20:01:56.9890441Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9891608Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9892085Z ^ 2025-05-07T20:01:56.9892359Z detected during: 2025-05-07T20:01:56.9906947Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.9934597Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.9962725Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.9978791Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.9979933Z 2025-05-07T20:01:56.9980186Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.9980576Z 2025-05-07T20:01:56.9981378Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.9982520Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.9982989Z ^ 2025-05-07T20:01:56.9983245Z detected during: 2025-05-07T20:01:56.9997128Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:57.0025322Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:57.0052955Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:57.0081195Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:57.0097267Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:57.0098424Z 2025-05-07T20:01:57.0099217Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:57.0100372Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:57.0100816Z ^ 2025-05-07T20:01:57.0101110Z detected during: 2025-05-07T20:01:57.0115790Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:57.0143418Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:57.0171663Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:57.0187813Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:57.0189019Z 2025-05-07T20:01:57.0189271Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:57.0189630Z 2025-05-07T20:01:57.0190444Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:57.0191550Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:57.0191992Z ^ 2025-05-07T20:01:57.0192231Z detected during: 2025-05-07T20:01:57.0206172Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:57.0234330Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:57.0261941Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:57.0290235Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:57.0306337Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:57.0307471Z 2025-05-07T20:01:59.6803707Z [134/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o 2025-05-07T20:01:59.6816415Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:59.6818005Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.6819151Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.6819628Z ^ 2025-05-07T20:01:59.6819813Z 2025-05-07T20:01:59.6820064Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.6820449Z 2025-05-07T20:01:59.6821265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.6822444Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:59.6822890Z ^ 2025-05-07T20:01:59.6823085Z 2025-05-07T20:01:59.6823880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.6825068Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.6825514Z ^ 2025-05-07T20:01:59.6825861Z detected during: 2025-05-07T20:01:59.6840672Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.6869195Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.6897876Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.6914039Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.6915205Z 2025-05-07T20:01:59.6915453Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.6915837Z 2025-05-07T20:01:59.6916633Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.6917771Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.6918178Z ^ 2025-05-07T20:01:59.6918437Z detected during: 2025-05-07T20:01:59.6932362Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.6960709Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.6990808Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7019580Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7035936Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7037105Z 2025-05-07T20:01:59.7037906Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7039078Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7039526Z ^ 2025-05-07T20:01:59.7039830Z detected during: 2025-05-07T20:01:59.7054711Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7083207Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7111825Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7128182Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7129346Z 2025-05-07T20:01:59.7129596Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7129954Z 2025-05-07T20:01:59.7130776Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7131879Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7132311Z ^ 2025-05-07T20:01:59.7132542Z detected during: 2025-05-07T20:01:59.7146482Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7175172Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7203506Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7232163Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7248480Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7249673Z 2025-05-07T20:01:59.7250487Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7251641Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7252104Z ^ 2025-05-07T20:01:59.7252373Z detected during: 2025-05-07T20:01:59.7267410Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7295710Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7325422Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7341623Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7342842Z 2025-05-07T20:01:59.7343114Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7343475Z 2025-05-07T20:01:59.7344267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7345400Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7345807Z ^ 2025-05-07T20:01:59.7346055Z detected during: 2025-05-07T20:01:59.7359914Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7388522Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7416669Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7445306Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7461506Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7462634Z 2025-05-07T20:01:59.7463423Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7464556Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7464998Z ^ 2025-05-07T20:01:59.7465253Z detected during: 2025-05-07T20:01:59.7480139Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7508304Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7537003Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7553177Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7554314Z 2025-05-07T20:01:59.7554569Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7554953Z 2025-05-07T20:01:59.7555750Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7556914Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7557324Z ^ 2025-05-07T20:01:59.7557583Z detected during: 2025-05-07T20:01:59.7571699Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7600187Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7629428Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7658179Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7679794Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7681001Z 2025-05-07T20:01:59.7681842Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7683185Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7683662Z ^ 2025-05-07T20:01:59.7684021Z detected during: 2025-05-07T20:01:59.7698918Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7727195Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7755876Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7772391Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7773549Z 2025-05-07T20:01:59.7773801Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7774186Z 2025-05-07T20:01:59.7774987Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7776114Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7776520Z ^ 2025-05-07T20:01:59.7776776Z detected during: 2025-05-07T20:01:59.7790744Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7819272Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7847490Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7876335Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7892644Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7893811Z 2025-05-07T20:01:59.7894608Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7895768Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7896214Z ^ 2025-05-07T20:01:59.7896507Z detected during: 2025-05-07T20:01:59.7911437Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7939717Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7969725Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7985856Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.7987017Z 2025-05-07T20:01:59.7987268Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7987629Z 2025-05-07T20:01:59.7988450Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7989552Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7989986Z ^ 2025-05-07T20:01:59.7990284Z detected during: 2025-05-07T20:01:59.8004243Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.8032582Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.8060724Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.8089612Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.8105794Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:59.8106986Z 2025-05-07T20:02:04.5257059Z [135/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o 2025-05-07T20:02:04.5279178Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:04.5281910Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.5283993Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.5284761Z ^ 2025-05-07T20:02:04.5285047Z 2025-05-07T20:02:04.5285435Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.5286056Z 2025-05-07T20:02:04.5287475Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.5289430Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:04.5290159Z ^ 2025-05-07T20:02:04.5290682Z 2025-05-07T20:02:04.5292047Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.5293908Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.5294644Z ^ 2025-05-07T20:02:04.5295070Z detected during: 2025-05-07T20:02:04.5320190Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.5368089Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.5418737Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.5447071Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.5449053Z 2025-05-07T20:02:04.5449509Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.5450111Z 2025-05-07T20:02:04.5453365Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.5455402Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.5456054Z ^ 2025-05-07T20:02:04.5456426Z detected during: 2025-05-07T20:02:04.5481388Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:04.5532019Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.5580062Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.5628479Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.5655445Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.5657397Z 2025-05-07T20:02:04.5658777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.5660710Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.5661451Z ^ 2025-05-07T20:02:04.5661861Z detected during: 2025-05-07T20:02:04.5687013Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.5734114Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.5782329Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.5809928Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.5811870Z 2025-05-07T20:02:04.5812299Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.5812902Z 2025-05-07T20:02:04.5814193Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.5816120Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.5816782Z ^ 2025-05-07T20:02:04.5817168Z detected during: 2025-05-07T20:02:04.5840652Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:04.5890527Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.5937937Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.5986882Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6015654Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6017671Z 2025-05-07T20:02:04.6019045Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6021051Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6021806Z ^ 2025-05-07T20:02:04.6022265Z detected during: 2025-05-07T20:02:04.6047343Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.6095575Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.6143917Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6171358Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6173331Z 2025-05-07T20:02:04.6173773Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.6174377Z 2025-05-07T20:02:04.6175728Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6177611Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6178288Z ^ 2025-05-07T20:02:04.6178661Z detected during: 2025-05-07T20:02:04.6202705Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:04.6251682Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.6299747Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.6348520Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6376330Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6378319Z 2025-05-07T20:02:04.6379722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6381681Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6382445Z ^ 2025-05-07T20:02:04.6382862Z detected during: 2025-05-07T20:02:04.6408096Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.6455617Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.6504209Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6531641Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6533669Z 2025-05-07T20:02:04.6534070Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.6534844Z 2025-05-07T20:02:04.6536104Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6537978Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6538658Z ^ 2025-05-07T20:02:04.6539049Z detected during: 2025-05-07T20:02:04.6564156Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:04.6613095Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.6660796Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.6710092Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6737945Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6739979Z 2025-05-07T20:02:04.6741366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6743466Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6744216Z ^ 2025-05-07T20:02:04.6744765Z detected during: 2025-05-07T20:02:04.6769524Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.6797353Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.6825632Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6841536Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6842747Z 2025-05-07T20:02:04.6842997Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.6843361Z 2025-05-07T20:02:04.6844186Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6845295Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6845719Z ^ 2025-05-07T20:02:04.6845950Z detected during: 2025-05-07T20:02:04.6859959Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:04.6888504Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.6916137Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.6944491Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.6960552Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.6961711Z 2025-05-07T20:02:04.6962522Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.6963720Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.6964188Z ^ 2025-05-07T20:02:04.6964455Z detected during: 2025-05-07T20:02:04.6980253Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.7008063Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.7036259Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.7052375Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.7053500Z 2025-05-07T20:02:04.7053772Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:04.7054131Z 2025-05-07T20:02:04.7054927Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:04.7056055Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:04.7056462Z ^ 2025-05-07T20:02:04.7056706Z detected during: 2025-05-07T20:02:04.7070656Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:04.7098907Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:04.7126758Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:04.7154814Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:04.7170966Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:04.7172091Z 2025-05-07T20:02:06.8856023Z [136/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o 2025-05-07T20:02:06.8868879Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:06.8870441Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.8871635Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.8872082Z ^ 2025-05-07T20:02:06.8872248Z 2025-05-07T20:02:06.8872570Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.8872932Z 2025-05-07T20:02:06.8873758Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.8874932Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:06.8875402Z ^ 2025-05-07T20:02:06.8875574Z 2025-05-07T20:02:06.8876363Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.8877508Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.8877943Z ^ 2025-05-07T20:02:06.8878222Z detected during: 2025-05-07T20:02:06.8892825Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.8920672Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.8948710Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.8964635Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.8965798Z 2025-05-07T20:02:06.8966043Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.8966407Z 2025-05-07T20:02:06.8967400Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.8968507Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.8968936Z ^ 2025-05-07T20:02:06.8969169Z detected during: 2025-05-07T20:02:06.8984587Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.9012811Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9040283Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9068638Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9084577Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9085719Z 2025-05-07T20:02:06.9086537Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9087667Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9088130Z ^ 2025-05-07T20:02:06.9088420Z detected during: 2025-05-07T20:02:06.9102924Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9130550Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9158522Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9174513Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9175659Z 2025-05-07T20:02:06.9175926Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.9176287Z 2025-05-07T20:02:06.9177091Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9178217Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9178720Z ^ 2025-05-07T20:02:06.9178953Z detected during: 2025-05-07T20:02:06.9192867Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.9221338Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9248880Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9276790Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9292675Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9293817Z 2025-05-07T20:02:06.9294605Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9296586Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9297059Z ^ 2025-05-07T20:02:06.9297327Z detected during: 2025-05-07T20:02:06.9311896Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9339414Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9367757Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9383696Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9384834Z 2025-05-07T20:02:06.9385082Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.9385466Z 2025-05-07T20:02:06.9386293Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9387444Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9387855Z ^ 2025-05-07T20:02:06.9388114Z detected during: 2025-05-07T20:02:06.9401938Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.9430026Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9457383Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9485641Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9501521Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9502706Z 2025-05-07T20:02:06.9503502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9504654Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9505100Z ^ 2025-05-07T20:02:06.9505391Z detected during: 2025-05-07T20:02:06.9519682Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9547132Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9575231Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9591137Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9592293Z 2025-05-07T20:02:06.9592566Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.9592924Z 2025-05-07T20:02:06.9593787Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9594888Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9595313Z ^ 2025-05-07T20:02:06.9595545Z detected during: 2025-05-07T20:02:06.9609450Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.9638611Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9665997Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9694164Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9710055Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9711199Z 2025-05-07T20:02:06.9712017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9713149Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9713613Z ^ 2025-05-07T20:02:06.9713884Z detected during: 2025-05-07T20:02:06.9728419Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9755663Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9783655Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9810755Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9811926Z 2025-05-07T20:02:06.9812201Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.9812562Z 2025-05-07T20:02:06.9813370Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9814512Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9814945Z ^ 2025-05-07T20:02:06.9815179Z detected during: 2025-05-07T20:02:06.9829064Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.9857022Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9884632Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.9912691Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.9928640Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:06.9929780Z 2025-05-07T20:02:06.9930585Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.9931753Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.9932231Z ^ 2025-05-07T20:02:06.9932506Z detected during: 2025-05-07T20:02:06.9947027Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.9975868Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.0003995Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.0019927Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:07.0021066Z 2025-05-07T20:02:07.0021323Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.0021704Z 2025-05-07T20:02:07.0022505Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.0023627Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.0024033Z ^ 2025-05-07T20:02:07.0024297Z detected during: 2025-05-07T20:02:07.0038224Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.0066415Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.0094286Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.0122207Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.0138176Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:07.0139338Z 2025-05-07T20:02:07.1339722Z [137/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o 2025-05-07T20:02:07.1352388Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:07.1353959Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1355104Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1355571Z ^ 2025-05-07T20:02:07.1355749Z 2025-05-07T20:02:07.1355993Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.1356373Z 2025-05-07T20:02:07.1357194Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1358373Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:07.1358823Z ^ 2025-05-07T20:02:07.1359013Z 2025-05-07T20:02:07.1359825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1360963Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1361426Z ^ 2025-05-07T20:02:07.1361694Z detected during: 2025-05-07T20:02:07.1376752Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.1404671Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.1432856Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.1448921Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.1450064Z 2025-05-07T20:02:07.1450334Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.1450695Z 2025-05-07T20:02:07.1451498Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1452633Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1453040Z ^ 2025-05-07T20:02:07.1453299Z detected during: 2025-05-07T20:02:07.1467404Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.1497157Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.1524972Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.1552836Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.1569040Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.1570189Z 2025-05-07T20:02:07.1570994Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1572173Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1572654Z ^ 2025-05-07T20:02:07.1573018Z detected during: 2025-05-07T20:02:07.1587674Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.1615282Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.1643601Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.1659576Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.1660718Z 2025-05-07T20:02:07.1660965Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.1661381Z 2025-05-07T20:02:07.1662180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1663302Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1663711Z ^ 2025-05-07T20:02:07.1663963Z detected during: 2025-05-07T20:02:07.1678033Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.1706453Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.1734253Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.1762313Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.1778471Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.1779640Z 2025-05-07T20:02:07.1780435Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1781636Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1782121Z ^ 2025-05-07T20:02:07.1782415Z detected during: 2025-05-07T20:02:07.1797955Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.1825839Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.1853958Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.1870130Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.1871290Z 2025-05-07T20:02:07.1871540Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.1871901Z 2025-05-07T20:02:07.1872733Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1873836Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1874263Z ^ 2025-05-07T20:02:07.1874502Z detected during: 2025-05-07T20:02:07.1888514Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.1916717Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.1944435Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.1972886Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.1988920Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.1990056Z 2025-05-07T20:02:07.1990882Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.1992031Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.1992503Z ^ 2025-05-07T20:02:07.1992771Z detected during: 2025-05-07T20:02:07.2007370Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.2035089Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.2063262Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.2079433Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.2080581Z 2025-05-07T20:02:07.2080858Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.2081221Z 2025-05-07T20:02:07.2082016Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.2083194Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.2083596Z ^ 2025-05-07T20:02:07.2083845Z detected during: 2025-05-07T20:02:07.2097739Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.2126953Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.2154752Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.2183085Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.2199096Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.2200232Z 2025-05-07T20:02:07.2201028Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.2202189Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.2202697Z ^ 2025-05-07T20:02:07.2202968Z detected during: 2025-05-07T20:02:07.2217510Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.2245269Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.2273665Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.2289669Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.2290810Z 2025-05-07T20:02:07.2291056Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.2291439Z 2025-05-07T20:02:07.2292234Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.2293347Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.2293754Z ^ 2025-05-07T20:02:07.2294007Z detected during: 2025-05-07T20:02:07.2307893Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.2336232Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.2364037Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.2392377Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.2408402Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.2409527Z 2025-05-07T20:02:07.2410336Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.2411506Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.2411978Z ^ 2025-05-07T20:02:07.2412247Z detected during: 2025-05-07T20:02:07.2426733Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.2455548Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.2483890Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.2499886Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.2501016Z 2025-05-07T20:02:07.2501260Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:07.2501622Z 2025-05-07T20:02:07.2502437Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:07.2503538Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:07.2503951Z ^ 2025-05-07T20:02:07.2504173Z detected during: 2025-05-07T20:02:07.2518034Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:07.2546275Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:07.2574057Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:07.2602165Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:07.2618097Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:07.2619273Z 2025-05-07T20:02:10.0936197Z [138/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o 2025-05-07T20:02:10.0948805Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:10.0950390Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0951533Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0951998Z ^ 2025-05-07T20:02:10.0952172Z 2025-05-07T20:02:10.0952421Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.0952809Z 2025-05-07T20:02:10.0953626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0954804Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:10.0955334Z ^ 2025-05-07T20:02:10.0955533Z 2025-05-07T20:02:10.0956327Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0957483Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0957932Z ^ 2025-05-07T20:02:10.0958216Z detected during: 2025-05-07T20:02:10.0973061Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1000639Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1028668Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1044655Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1045810Z 2025-05-07T20:02:10.1046056Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1046445Z 2025-05-07T20:02:10.1047314Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1048424Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1048858Z ^ 2025-05-07T20:02:10.1049097Z detected during: 2025-05-07T20:02:10.1062927Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1092639Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1120041Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1148276Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1164258Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1165389Z 2025-05-07T20:02:10.1166212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1167513Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1167988Z ^ 2025-05-07T20:02:10.1168288Z detected during: 2025-05-07T20:02:10.1182859Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1210371Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1238330Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1254340Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1255505Z 2025-05-07T20:02:10.1255778Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1256139Z 2025-05-07T20:02:10.1256926Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1258051Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1258477Z ^ 2025-05-07T20:02:10.1258709Z detected during: 2025-05-07T20:02:10.1272660Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1300876Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1328357Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1356236Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1372376Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1373508Z 2025-05-07T20:02:10.1374303Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1375454Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1375929Z ^ 2025-05-07T20:02:10.1376206Z detected during: 2025-05-07T20:02:10.1390785Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1419290Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1447424Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1463320Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1464452Z 2025-05-07T20:02:10.1464700Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1465078Z 2025-05-07T20:02:10.1465876Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1467000Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1467559Z ^ 2025-05-07T20:02:10.1467817Z detected during: 2025-05-07T20:02:10.1481720Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1509940Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1537645Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1565693Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1581699Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1582846Z 2025-05-07T20:02:10.1583642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1584797Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1585245Z ^ 2025-05-07T20:02:10.1585532Z detected during: 2025-05-07T20:02:10.1600076Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1627655Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1655746Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1671749Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1672914Z 2025-05-07T20:02:10.1673162Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1673525Z 2025-05-07T20:02:10.1674352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1675458Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1675886Z ^ 2025-05-07T20:02:10.1676122Z detected during: 2025-05-07T20:02:10.1690077Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1718141Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1746577Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1774768Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1790665Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1791801Z 2025-05-07T20:02:10.1792630Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1793757Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1794227Z ^ 2025-05-07T20:02:10.1794513Z detected during: 2025-05-07T20:02:10.1809058Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1836602Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1864650Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1880670Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.1881810Z 2025-05-07T20:02:10.1882084Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1882447Z 2025-05-07T20:02:10.1883290Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1884508Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1884938Z ^ 2025-05-07T20:02:10.1885170Z detected during: 2025-05-07T20:02:10.1899122Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1927367Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1954977Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1983138Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1999092Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.2000222Z 2025-05-07T20:02:10.2001043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.2002201Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.2002708Z ^ 2025-05-07T20:02:10.2003009Z detected during: 2025-05-07T20:02:10.2017588Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.2045895Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.2074036Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.2090000Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.2091171Z 2025-05-07T20:02:10.2091423Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.2091806Z 2025-05-07T20:02:10.2092597Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.2093721Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.2094131Z ^ 2025-05-07T20:02:10.2094387Z detected during: 2025-05-07T20:02:10.2108206Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.2136308Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.2163855Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.2191897Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.2207810Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:10.2208966Z 2025-05-07T20:02:11.5762398Z [139/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o 2025-05-07T20:02:11.5775386Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:11.5776949Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.5778077Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.5778543Z ^ 2025-05-07T20:02:11.5778719Z 2025-05-07T20:02:11.5778983Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.5779343Z 2025-05-07T20:02:11.5780156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.5781328Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:11.5781835Z ^ 2025-05-07T20:02:11.5782034Z 2025-05-07T20:02:11.5782833Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.5784043Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.5784487Z ^ 2025-05-07T20:02:11.5784816Z detected during: 2025-05-07T20:02:11.5799679Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.5828017Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.5856775Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.5873160Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.5874325Z 2025-05-07T20:02:11.5874576Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.5874941Z 2025-05-07T20:02:11.5875767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.5876880Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.5877316Z ^ 2025-05-07T20:02:11.5877550Z detected during: 2025-05-07T20:02:11.5891565Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.5921390Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.5949786Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.5978804Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.5995053Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.5996178Z 2025-05-07T20:02:11.5996994Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.5998133Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.5998600Z ^ 2025-05-07T20:02:11.5998892Z detected during: 2025-05-07T20:02:11.6013744Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6041921Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6070700Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6086898Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6088037Z 2025-05-07T20:02:11.6088316Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.6088675Z 2025-05-07T20:02:11.6089483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6090610Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6091069Z ^ 2025-05-07T20:02:11.6091301Z detected during: 2025-05-07T20:02:11.6105262Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.6133724Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6161907Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6190621Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6206780Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6207941Z 2025-05-07T20:02:11.6208728Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6209864Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6210308Z ^ 2025-05-07T20:02:11.6210560Z detected during: 2025-05-07T20:02:11.6225346Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6254515Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6283238Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6299376Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6300566Z 2025-05-07T20:02:11.6300806Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.6301166Z 2025-05-07T20:02:11.6301964Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6303049Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6303435Z ^ 2025-05-07T20:02:11.6303662Z detected during: 2025-05-07T20:02:11.6317351Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.6345769Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6374150Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6402930Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6419227Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6420382Z 2025-05-07T20:02:11.6421180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6422343Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6422794Z ^ 2025-05-07T20:02:11.6423092Z detected during: 2025-05-07T20:02:11.6437988Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6466289Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6495065Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6511304Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6512475Z 2025-05-07T20:02:11.6512725Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.6513083Z 2025-05-07T20:02:11.6513904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6515054Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6515480Z ^ 2025-05-07T20:02:11.6515714Z detected during: 2025-05-07T20:02:11.6529668Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.6559110Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6587499Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6616146Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6632411Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6633545Z 2025-05-07T20:02:11.6634362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6635524Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6635989Z ^ 2025-05-07T20:02:11.6636297Z detected during: 2025-05-07T20:02:11.6651162Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6679390Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6707868Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6724159Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6725290Z 2025-05-07T20:02:11.6725566Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.6725927Z 2025-05-07T20:02:11.6726725Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6727865Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6728280Z ^ 2025-05-07T20:02:11.6728547Z detected during: 2025-05-07T20:02:11.6742412Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.6770875Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6799217Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6827830Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6844107Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6845241Z 2025-05-07T20:02:11.6846036Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6847192Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6847662Z ^ 2025-05-07T20:02:11.6847930Z detected during: 2025-05-07T20:02:11.6862791Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.6891897Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.6920574Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.6936790Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.6937937Z 2025-05-07T20:02:11.6938185Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.6938563Z 2025-05-07T20:02:11.6939361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.6940479Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.6940884Z ^ 2025-05-07T20:02:11.6941167Z detected during: 2025-05-07T20:02:11.6955023Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.6983862Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7012060Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7040640Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7056868Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:02:11.7058046Z 2025-05-07T20:02:11.7069631Z [140/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o 2025-05-07T20:02:11.7081937Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:11.7083537Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7084685Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7085132Z ^ 2025-05-07T20:02:11.7085329Z 2025-05-07T20:02:11.7085572Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7085925Z 2025-05-07T20:02:11.7086763Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7087907Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:11.7088367Z ^ 2025-05-07T20:02:11.7088580Z 2025-05-07T20:02:11.7089370Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7090509Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7090968Z ^ 2025-05-07T20:02:11.7091228Z detected during: 2025-05-07T20:02:11.7106051Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7134090Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7162663Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7179117Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:11.7180287Z 2025-05-07T20:02:11.7180532Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7180917Z 2025-05-07T20:02:11.7181745Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7182896Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7183341Z ^ 2025-05-07T20:02:11.7183629Z detected during: 2025-05-07T20:02:11.7198450Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7227434Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7255998Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7272100Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:11.7273273Z 2025-05-07T20:02:11.7273524Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7273884Z 2025-05-07T20:02:11.7274699Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7275836Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7276303Z ^ 2025-05-07T20:02:11.7276577Z detected during: 2025-05-07T20:02:11.7291512Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7319477Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7348118Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7364486Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:11.7365624Z 2025-05-07T20:02:11.7365868Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7366246Z 2025-05-07T20:02:11.7367142Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7368359Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7368811Z ^ 2025-05-07T20:02:11.7369105Z detected during: 2025-05-07T20:02:11.7383984Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7422051Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7451029Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7467464Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:11.7468648Z 2025-05-07T20:02:11.7468904Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7469270Z 2025-05-07T20:02:11.7470144Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7471318Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7471793Z ^ 2025-05-07T20:02:11.7472067Z detected during: 2025-05-07T20:02:11.7486968Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7515072Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7544529Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7560749Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:11.7561887Z 2025-05-07T20:02:11.7562161Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7562527Z 2025-05-07T20:02:11.7563371Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7564537Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7564984Z ^ 2025-05-07T20:02:11.7565278Z detected during: 2025-05-07T20:02:11.7580194Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7608425Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7636992Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7653280Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:11.7654446Z 2025-05-07T20:02:11.7654696Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7655059Z 2025-05-07T20:02:11.7727977Z [141/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o 2025-05-07T20:02:11.7742973Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:11.7744731Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7745894Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7746343Z ^ 2025-05-07T20:02:11.7746520Z 2025-05-07T20:02:11.7746786Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7747147Z 2025-05-07T20:02:11.7747972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7749141Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:11.7749636Z ^ 2025-05-07T20:02:11.7749812Z 2025-05-07T20:02:11.7750620Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7751746Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7752212Z ^ 2025-05-07T20:02:11.7752476Z detected during: 2025-05-07T20:02:11.7767304Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7794862Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7822920Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7838847Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.7839995Z 2025-05-07T20:02:11.7840265Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.7840620Z 2025-05-07T20:02:11.7841421Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7842574Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7842990Z ^ 2025-05-07T20:02:11.7843245Z detected during: 2025-05-07T20:02:11.7857201Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.7885523Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.7913820Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.7941870Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.7957783Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.7958926Z 2025-05-07T20:02:11.7959728Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.7960893Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.7961358Z ^ 2025-05-07T20:02:11.7961625Z detected during: 2025-05-07T20:02:11.7976259Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8003726Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8031583Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8047522Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8048659Z 2025-05-07T20:02:11.8048907Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.8049284Z 2025-05-07T20:02:11.8050084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8051251Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8051662Z ^ 2025-05-07T20:02:11.8051946Z detected during: 2025-05-07T20:02:11.8065527Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.8093662Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8121050Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8148977Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8164872Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8166024Z 2025-05-07T20:02:11.8166822Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8168155Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8168602Z ^ 2025-05-07T20:02:11.8168896Z detected during: 2025-05-07T20:02:11.8183402Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8210895Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8238669Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8255139Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8256297Z 2025-05-07T20:02:11.8256549Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.8256909Z 2025-05-07T20:02:11.8257844Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8258975Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8259436Z ^ 2025-05-07T20:02:11.8259664Z detected during: 2025-05-07T20:02:11.8273441Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.8301563Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8328977Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8356915Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8372820Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8373951Z 2025-05-07T20:02:11.8375208Z ptxas /tmp/tmpxft_00008cab_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:11.8377773Z ptxas /tmp/tmpxft_00008cab_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:11.8380297Z ptxas /tmp/tmpxft_00008cab_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:11.8382832Z ptxas /tmp/tmpxft_00008cab_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:11.8384965Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8386105Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8386568Z ^ 2025-05-07T20:02:11.8386856Z detected during: 2025-05-07T20:02:11.8401323Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8428902Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8456703Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8472691Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8473826Z 2025-05-07T20:02:11.8474104Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.8474464Z 2025-05-07T20:02:11.8475255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8476380Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8476806Z ^ 2025-05-07T20:02:11.8477037Z detected during: 2025-05-07T20:02:11.8490886Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.8518976Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8546500Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8575387Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8591337Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8592463Z 2025-05-07T20:02:11.8593290Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8594474Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8594983Z ^ 2025-05-07T20:02:11.8595251Z detected during: 2025-05-07T20:02:11.8609845Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8637342Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8665297Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8681281Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8682412Z 2025-05-07T20:02:11.8682733Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.8683120Z 2025-05-07T20:02:11.8683928Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8685061Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8685462Z ^ 2025-05-07T20:02:11.8685719Z detected during: 2025-05-07T20:02:11.8699388Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.8727497Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8754953Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8783051Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8798918Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8800084Z 2025-05-07T20:02:11.8800903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8802092Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8802605Z ^ 2025-05-07T20:02:11.8802885Z detected during: 2025-05-07T20:02:11.8817319Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8844756Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8872506Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.8889059Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.8890180Z 2025-05-07T20:02:11.8890418Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.8890787Z 2025-05-07T20:02:11.8891575Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.8892675Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.8893064Z ^ 2025-05-07T20:02:11.8893291Z detected during: 2025-05-07T20:02:11.8907006Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.8935009Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.8962098Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.8990114Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.9006043Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:11.9007280Z 2025-05-07T20:02:12.1274248Z [142/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o 2025-05-07T20:02:12.1286924Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:12.1288521Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1289668Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1290140Z ^ 2025-05-07T20:02:12.1290316Z 2025-05-07T20:02:12.1290588Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.1290946Z 2025-05-07T20:02:12.1291762Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1293020Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:12.1293518Z ^ 2025-05-07T20:02:12.1293719Z 2025-05-07T20:02:12.1294512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1295760Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1296203Z ^ 2025-05-07T20:02:12.1296499Z detected during: 2025-05-07T20:02:12.1311403Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1340903Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.1369838Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.1386189Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.1387354Z 2025-05-07T20:02:12.1387605Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.1387990Z 2025-05-07T20:02:12.1388791Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1389919Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1390329Z ^ 2025-05-07T20:02:12.1390584Z detected during: 2025-05-07T20:02:12.1404606Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:12.1433153Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1461291Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.1490190Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.1506446Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.1507605Z 2025-05-07T20:02:12.1508409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1509576Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1510015Z ^ 2025-05-07T20:02:12.1510309Z detected during: 2025-05-07T20:02:12.1525173Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1553328Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.1582233Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.1598485Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.1599647Z 2025-05-07T20:02:12.1599900Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.1600262Z 2025-05-07T20:02:12.1601062Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1602228Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1602745Z ^ 2025-05-07T20:02:12.1602977Z detected during: 2025-05-07T20:02:12.1616914Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:12.1646107Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1674541Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.1703216Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.1719473Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.1720607Z 2025-05-07T20:02:12.1721427Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1722626Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1723096Z ^ 2025-05-07T20:02:12.1723364Z detected during: 2025-05-07T20:02:12.1738165Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1766514Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.1795077Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.1811518Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.1812658Z 2025-05-07T20:02:12.1812907Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.1813277Z 2025-05-07T20:02:12.1814068Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1815199Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1815606Z ^ 2025-05-07T20:02:12.1815854Z detected during: 2025-05-07T20:02:12.1829739Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:12.1858327Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1887109Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.1915759Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.1931991Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.1933136Z 2025-05-07T20:02:12.1933933Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.1935094Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.1935543Z ^ 2025-05-07T20:02:12.1935834Z detected during: 2025-05-07T20:02:12.1950728Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.1979857Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.2008492Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.2024667Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.2025826Z 2025-05-07T20:02:12.2026076Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.2026438Z 2025-05-07T20:02:12.2027265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.2028373Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.2028800Z ^ 2025-05-07T20:02:12.2029049Z detected during: 2025-05-07T20:02:12.2043074Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:12.2071659Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.2099539Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.2128233Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.2144450Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.2145635Z 2025-05-07T20:02:12.2146462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.2147652Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.2148105Z ^ 2025-05-07T20:02:12.2148395Z detected during: 2025-05-07T20:02:12.2163247Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.2191747Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.2220412Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.2236658Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.2237815Z 2025-05-07T20:02:12.2238067Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.2238428Z 2025-05-07T20:02:12.2239229Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.2240355Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.2240787Z ^ 2025-05-07T20:02:12.2241015Z detected during: 2025-05-07T20:02:12.2254969Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:12.2283638Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.2312594Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.2341388Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.2357567Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.2358711Z 2025-05-07T20:02:12.2359526Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.2360665Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.2361138Z ^ 2025-05-07T20:02:12.2361400Z detected during: 2025-05-07T20:02:12.2376440Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.2404519Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.2433071Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.2449302Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.2450434Z 2025-05-07T20:02:12.2450705Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.2451063Z 2025-05-07T20:02:12.2451955Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.2453107Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.2453509Z ^ 2025-05-07T20:02:12.2453788Z detected during: 2025-05-07T20:02:12.2467827Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:12.2496339Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.2524311Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.2552695Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.2569206Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:12.2570345Z 2025-05-07T20:02:13.0099646Z [143/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o 2025-05-07T20:02:13.0112253Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:13.0113837Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0114983Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0115413Z ^ 2025-05-07T20:02:13.0115600Z 2025-05-07T20:02:13.0115847Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0116206Z 2025-05-07T20:02:13.0117047Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0118200Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:13.0118658Z ^ 2025-05-07T20:02:13.0118832Z 2025-05-07T20:02:13.0119627Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0120777Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0121241Z ^ 2025-05-07T20:02:13.0121508Z detected during: 2025-05-07T20:02:13.0136541Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.0164572Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.0193245Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.0209589Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:13.0210749Z 2025-05-07T20:02:13.0211001Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0211380Z 2025-05-07T20:02:13.0212177Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0213330Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0213782Z ^ 2025-05-07T20:02:13.0214076Z detected during: 2025-05-07T20:02:13.0229969Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.0258140Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.0286856Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.0303063Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:13.0304218Z 2025-05-07T20:02:13.0304466Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0304831Z 2025-05-07T20:02:13.0305629Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0306805Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0307271Z ^ 2025-05-07T20:02:13.0307540Z detected during: 2025-05-07T20:02:13.0322394Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.0350636Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.0379471Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.0395662Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:13.0396796Z 2025-05-07T20:02:13.0397048Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0397431Z 2025-05-07T20:02:13.0398654Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0401210Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0403817Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0406351Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0408914Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0411504Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0414070Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0416668Z ptxas /tmp/tmpxft_00008ca6_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:13.0418774Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0419918Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0420356Z ^ 2025-05-07T20:02:13.0420606Z detected during: 2025-05-07T20:02:13.0435290Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.0463409Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.0491916Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.0508125Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:13.0509268Z 2025-05-07T20:02:13.0509507Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0509871Z 2025-05-07T20:02:13.0510657Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0511781Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0512221Z ^ 2025-05-07T20:02:13.0512491Z detected during: 2025-05-07T20:02:13.0527241Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.0556354Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.0585136Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.0601265Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:13.0602393Z 2025-05-07T20:02:13.0602692Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0603045Z 2025-05-07T20:02:13.0603834Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.0604963Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.0605408Z ^ 2025-05-07T20:02:13.0605658Z detected during: 2025-05-07T20:02:13.0620351Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.0648571Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.0677303Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.0693586Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:13.0694767Z 2025-05-07T20:02:13.0695018Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.0695396Z 2025-05-07T20:02:13.4706085Z [144/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o 2025-05-07T20:02:13.4718604Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:13.4720168Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4721335Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4721811Z ^ 2025-05-07T20:02:13.4721994Z 2025-05-07T20:02:13.4722248Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4722716Z 2025-05-07T20:02:13.4723635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4724883Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:13.4725426Z ^ 2025-05-07T20:02:13.4725601Z 2025-05-07T20:02:13.4726447Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4727580Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4728043Z ^ 2025-05-07T20:02:13.4728314Z detected during: 2025-05-07T20:02:13.4743194Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4771687Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4800405Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4816756Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:13.4817898Z 2025-05-07T20:02:13.4818171Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4818533Z 2025-05-07T20:02:13.4819339Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4820502Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4820945Z ^ 2025-05-07T20:02:13.4821235Z detected during: 2025-05-07T20:02:13.4836043Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4864243Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4894709Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4911144Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:13.4912294Z 2025-05-07T20:02:13.4912568Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4912926Z 2025-05-07T20:02:13.4913729Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4914887Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4915354Z ^ 2025-05-07T20:02:13.4915623Z detected during: 2025-05-07T20:02:13.4930487Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4958680Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4987587Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.5003893Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:13.5005056Z 2025-05-07T20:02:13.5005339Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.5005746Z 2025-05-07T20:02:13.5006545Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.5007713Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.5008215Z ^ 2025-05-07T20:02:13.5008511Z detected during: 2025-05-07T20:02:13.5023317Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.5051552Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.5080279Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.5096568Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:13.5097703Z 2025-05-07T20:02:13.5097966Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.5098325Z 2025-05-07T20:02:13.5099127Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.5100295Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.5100765Z ^ 2025-05-07T20:02:13.5101031Z detected during: 2025-05-07T20:02:13.5115902Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.5144120Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.5172923Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.5189809Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:13.5190954Z 2025-05-07T20:02:13.5191200Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.5191584Z 2025-05-07T20:02:13.5192391Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.5193543Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.5193992Z ^ 2025-05-07T20:02:13.5194277Z detected during: 2025-05-07T20:02:13.5209148Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.5237309Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.5266062Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.5282397Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:13.5283602Z 2025-05-07T20:02:13.5283849Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.5284211Z 2025-05-07T20:02:16.9513836Z [145/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o 2025-05-07T20:02:16.9526750Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:16.9528321Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9529485Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.9529972Z ^ 2025-05-07T20:02:16.9530152Z 2025-05-07T20:02:16.9530407Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.9530784Z 2025-05-07T20:02:16.9531606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9532780Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:16.9533222Z ^ 2025-05-07T20:02:16.9533394Z 2025-05-07T20:02:16.9534262Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9535402Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.9535916Z ^ 2025-05-07T20:02:16.9536210Z detected during: 2025-05-07T20:02:16.9551186Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.9579674Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.9608516Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.9624845Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:16.9626002Z 2025-05-07T20:02:16.9626254Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.9626635Z 2025-05-07T20:02:16.9627440Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9638023Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.9638690Z ^ 2025-05-07T20:02:16.9638998Z detected during: 2025-05-07T20:02:16.9654078Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.9684185Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.9713037Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.9729316Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:16.9730469Z 2025-05-07T20:02:16.9730719Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.9731081Z 2025-05-07T20:02:16.9731885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9733050Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.9733513Z ^ 2025-05-07T20:02:16.9733785Z detected during: 2025-05-07T20:02:16.9748613Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.9777137Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.9805855Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.9822091Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:16.9823228Z 2025-05-07T20:02:16.9823476Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.9823883Z 2025-05-07T20:02:16.9824670Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9825846Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.9826331Z ^ 2025-05-07T20:02:16.9826632Z detected during: 2025-05-07T20:02:16.9841570Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.9869988Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.9898778Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.9915009Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:16.9916160Z 2025-05-07T20:02:16.9916408Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.9916766Z 2025-05-07T20:02:16.9917584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.9918729Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.9919199Z ^ 2025-05-07T20:02:16.9919467Z detected during: 2025-05-07T20:02:16.9934266Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.9962427Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.9991937Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.0008430Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:17.0009559Z 2025-05-07T20:02:17.0009811Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.0010190Z 2025-05-07T20:02:17.0010992Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.0012165Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.0012614Z ^ 2025-05-07T20:02:17.0012906Z detected during: 2025-05-07T20:02:17.0027787Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.0055995Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.0084590Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.0100917Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:17.0102079Z 2025-05-07T20:02:17.0102328Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.0102691Z 2025-05-07T20:02:19.3997897Z [146/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o 2025-05-07T20:02:19.4010665Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:19.4012260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4013429Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4013878Z ^ 2025-05-07T20:02:19.4014073Z 2025-05-07T20:02:19.4014320Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4014678Z 2025-05-07T20:02:19.4015520Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4016660Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:19.4017121Z ^ 2025-05-07T20:02:19.4017294Z 2025-05-07T20:02:19.4018085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4019239Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4019702Z ^ 2025-05-07T20:02:19.4019966Z detected during: 2025-05-07T20:02:19.4034939Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.4063416Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.4092260Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.4108393Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:19.4109532Z 2025-05-07T20:02:19.4109782Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4110175Z 2025-05-07T20:02:19.4110980Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4112137Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4112582Z ^ 2025-05-07T20:02:19.4112871Z detected during: 2025-05-07T20:02:19.4127737Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.4155780Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.4184736Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.4202315Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:19.4203616Z 2025-05-07T20:02:19.4203926Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4204292Z 2025-05-07T20:02:19.4205092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4206242Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4206712Z ^ 2025-05-07T20:02:19.4206981Z detected during: 2025-05-07T20:02:19.4221849Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.4250032Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.4278754Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.4295041Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:19.4296183Z 2025-05-07T20:02:19.4296433Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4296819Z 2025-05-07T20:02:19.4297610Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4298778Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4299225Z ^ 2025-05-07T20:02:19.4299514Z detected during: 2025-05-07T20:02:19.4314431Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.4342812Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.4371618Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.4387857Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:19.4389028Z 2025-05-07T20:02:19.4389280Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4389640Z 2025-05-07T20:02:19.4390458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4391600Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4392076Z ^ 2025-05-07T20:02:19.4392346Z detected during: 2025-05-07T20:02:19.4407257Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.4435568Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.4464310Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.4480856Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:19.4482086Z 2025-05-07T20:02:19.4482335Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4482754Z 2025-05-07T20:02:19.4483590Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.4484753Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.4485201Z ^ 2025-05-07T20:02:19.4485492Z detected during: 2025-05-07T20:02:19.4500282Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.4529138Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.4557810Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.4574200Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:19.4575368Z 2025-05-07T20:02:19.4575618Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.4575978Z 2025-05-07T20:02:23.4176549Z [147/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o 2025-05-07T20:02:23.4189018Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:23.4190610Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4191752Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4192225Z ^ 2025-05-07T20:02:23.4192404Z 2025-05-07T20:02:23.4192654Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.4193039Z 2025-05-07T20:02:23.4193855Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4195032Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:23.4195476Z ^ 2025-05-07T20:02:23.4195671Z 2025-05-07T20:02:23.4196458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4197609Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4198046Z ^ 2025-05-07T20:02:23.4198335Z detected during: 2025-05-07T20:02:23.4212978Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4240505Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4268745Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4284722Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4285861Z 2025-05-07T20:02:23.4286131Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.4286489Z 2025-05-07T20:02:23.4287292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4288424Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4288860Z ^ 2025-05-07T20:02:23.4289092Z detected during: 2025-05-07T20:02:23.4302868Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:23.4330929Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4358346Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4386418Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4402263Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4403440Z 2025-05-07T20:02:23.4404232Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4405372Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4405812Z ^ 2025-05-07T20:02:23.4406061Z detected during: 2025-05-07T20:02:23.4420765Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4448179Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4476126Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4492031Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4493164Z 2025-05-07T20:02:23.4493403Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.4493769Z 2025-05-07T20:02:23.4494565Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4495663Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4496050Z ^ 2025-05-07T20:02:23.4496276Z detected during: 2025-05-07T20:02:23.4510004Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:23.4538071Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4565600Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4593736Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4609633Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4610800Z 2025-05-07T20:02:23.4611601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4612747Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4613188Z ^ 2025-05-07T20:02:23.4613474Z detected during: 2025-05-07T20:02:23.4628019Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4655546Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4683543Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4699341Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4700499Z 2025-05-07T20:02:23.4700750Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.4701107Z 2025-05-07T20:02:23.4701925Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4703029Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4703464Z ^ 2025-05-07T20:02:23.4703697Z detected during: 2025-05-07T20:02:23.4717402Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:23.4745486Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4773108Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4801061Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4816956Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4818098Z 2025-05-07T20:02:23.4819347Z ptxas /tmp/tmpxft_00008ca9_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.4821942Z ptxas /tmp/tmpxft_00008ca9_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.4824539Z ptxas /tmp/tmpxft_00008ca9_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.4827090Z ptxas /tmp/tmpxft_00008ca9_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.4829217Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4830351Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4830830Z ^ 2025-05-07T20:02:23.4831132Z detected during: 2025-05-07T20:02:23.4845634Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4873222Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.4901189Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.4917107Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.4918272Z 2025-05-07T20:02:23.4918528Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.4918893Z 2025-05-07T20:02:23.4919711Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.4920827Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.4921264Z ^ 2025-05-07T20:02:23.4921501Z detected during: 2025-05-07T20:02:23.4935310Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:23.4963343Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.4993674Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.5021981Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.5038100Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.5039267Z 2025-05-07T20:02:23.5040073Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.5041242Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.5041694Z ^ 2025-05-07T20:02:23.5041994Z detected during: 2025-05-07T20:02:23.5056518Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.5084025Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.5111747Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.5127659Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.5128815Z 2025-05-07T20:02:23.5129070Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.5129426Z 2025-05-07T20:02:23.5130255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.5131357Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.5131780Z ^ 2025-05-07T20:02:23.5132011Z detected during: 2025-05-07T20:02:23.5145814Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:23.5174024Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.5201451Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.5229497Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.5245193Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.5246335Z 2025-05-07T20:02:23.5247152Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.5248299Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.5248773Z ^ 2025-05-07T20:02:23.5249042Z detected during: 2025-05-07T20:02:23.5263495Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.5291115Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.5319013Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.5334950Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.5336089Z 2025-05-07T20:02:23.5336358Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.5336717Z 2025-05-07T20:02:23.5337517Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.5338696Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.5339106Z ^ 2025-05-07T20:02:23.5339399Z detected during: 2025-05-07T20:02:23.5353260Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:23.5381476Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.5409034Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.5436962Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.5452841Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:23.5453976Z 2025-05-07T20:02:24.1394022Z [148/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o 2025-05-07T20:02:24.1406748Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:24.1408332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1409472Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1409956Z ^ 2025-05-07T20:02:24.1410137Z 2025-05-07T20:02:24.1410413Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1410773Z 2025-05-07T20:02:24.1411607Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1412780Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:24.1413226Z ^ 2025-05-07T20:02:24.1413430Z 2025-05-07T20:02:24.1414225Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1415373Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1415821Z ^ 2025-05-07T20:02:24.1416115Z detected during: 2025-05-07T20:02:24.1430958Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.1459319Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.1488232Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.1504508Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:24.1505694Z 2025-05-07T20:02:24.1505938Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1506328Z 2025-05-07T20:02:24.1507265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1508398Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1508875Z ^ 2025-05-07T20:02:24.1509144Z detected during: 2025-05-07T20:02:24.1524016Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.1552071Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.1580839Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.1597061Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:24.1598189Z 2025-05-07T20:02:24.1598467Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1598831Z 2025-05-07T20:02:24.1599628Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1600777Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1601227Z ^ 2025-05-07T20:02:24.1601519Z detected during: 2025-05-07T20:02:24.1616296Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.1644517Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.1674355Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.1690615Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:24.1691785Z 2025-05-07T20:02:24.1692034Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1692396Z 2025-05-07T20:02:24.1693221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1694358Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1694830Z ^ 2025-05-07T20:02:24.1695098Z detected during: 2025-05-07T20:02:24.1709879Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.1738071Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.1766356Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.1782666Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:24.1783799Z 2025-05-07T20:02:24.1784072Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1784504Z 2025-05-07T20:02:24.1785300Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1786564Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1787072Z ^ 2025-05-07T20:02:24.1787338Z detected during: 2025-05-07T20:02:24.1802144Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.1830229Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.1858951Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.1875308Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:24.1876462Z 2025-05-07T20:02:24.1876708Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1877067Z 2025-05-07T20:02:24.1877865Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.1879018Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.1879487Z ^ 2025-05-07T20:02:24.1879757Z detected during: 2025-05-07T20:02:24.1894598Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.1922766Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.1951499Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.1968397Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:24.1969547Z 2025-05-07T20:02:24.1969796Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.1970184Z 2025-05-07T20:02:25.7320042Z [149/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o 2025-05-07T20:02:25.7332561Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:25.7334362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7335683Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7336199Z ^ 2025-05-07T20:02:25.7336382Z 2025-05-07T20:02:25.7336656Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7337017Z 2025-05-07T20:02:25.7337840Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7339028Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:25.7339500Z ^ 2025-05-07T20:02:25.7339677Z 2025-05-07T20:02:25.7340467Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7341613Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7342050Z ^ 2025-05-07T20:02:25.7342331Z detected during: 2025-05-07T20:02:25.7357178Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:25.7385596Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:25.7413834Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:25.7429743Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:25.7430874Z 2025-05-07T20:02:25.7431117Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7431468Z 2025-05-07T20:02:25.7432265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7433362Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7433814Z ^ 2025-05-07T20:02:25.7434129Z detected during: 2025-05-07T20:02:25.7448914Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:25.7477183Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:25.7505757Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:25.7521575Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:25.7522721Z 2025-05-07T20:02:25.7523131Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7523479Z 2025-05-07T20:02:25.7524276Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7525415Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7525855Z ^ 2025-05-07T20:02:25.7526105Z detected during: 2025-05-07T20:02:25.7540704Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:25.7568273Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:25.7596405Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:25.7613265Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:25.7614408Z 2025-05-07T20:02:25.7614647Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7615009Z 2025-05-07T20:02:25.7616313Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7618769Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7621199Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7623626Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7626081Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7628524Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7630968Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7633381Z ptxas /tmp/tmpxft_00008ca1_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:25.7635421Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7636513Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7636970Z ^ 2025-05-07T20:02:25.7637393Z detected during: 2025-05-07T20:02:25.7652227Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:25.7680587Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:25.7708837Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:25.7724844Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:25.7726019Z 2025-05-07T20:02:25.7726268Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7726631Z 2025-05-07T20:02:25.7727457Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7728592Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7729058Z ^ 2025-05-07T20:02:25.7729320Z detected during: 2025-05-07T20:02:25.7744248Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:25.7772582Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:25.7800726Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:25.7817185Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:25.7818292Z 2025-05-07T20:02:25.7818557Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7818905Z 2025-05-07T20:02:25.7819678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:25.7820803Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:25.7821243Z ^ 2025-05-07T20:02:25.7821526Z detected during: 2025-05-07T20:02:25.7835855Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:25.7863800Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:25.7892785Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:25.7908717Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:25.7909852Z 2025-05-07T20:02:25.7910098Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:25.7910447Z 2025-05-07T20:02:34.3406323Z [150/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o 2025-05-07T20:02:34.3420633Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:34.3422170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3423305Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3423786Z ^ 2025-05-07T20:02:34.3423958Z 2025-05-07T20:02:34.3424223Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3424573Z 2025-05-07T20:02:34.3425367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3426523Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:34.3426986Z ^ 2025-05-07T20:02:34.3427156Z 2025-05-07T20:02:34.3427931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3429061Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3429498Z ^ 2025-05-07T20:02:34.3429788Z detected during: 2025-05-07T20:02:34.3444602Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:34.3473372Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:34.3501884Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:34.3516877Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:34.3517940Z 2025-05-07T20:02:34.3518176Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3518509Z 2025-05-07T20:02:34.3519265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3520310Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3520744Z ^ 2025-05-07T20:02:34.3520994Z detected during: 2025-05-07T20:02:34.3535942Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:34.3562012Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:34.3590401Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:34.3605575Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:34.3606768Z 2025-05-07T20:02:34.3607055Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3607439Z 2025-05-07T20:02:34.3608241Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3609396Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3609853Z ^ 2025-05-07T20:02:34.3610148Z detected during: 2025-05-07T20:02:34.3624299Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:34.3650988Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:34.3679035Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:34.3695553Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:34.3696624Z 2025-05-07T20:02:34.3696851Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3697188Z 2025-05-07T20:02:34.3697944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3698984Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3699420Z ^ 2025-05-07T20:02:34.3699689Z detected during: 2025-05-07T20:02:34.3714236Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:34.3741980Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:34.3769420Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:34.3785394Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:34.3786456Z 2025-05-07T20:02:34.3786708Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3787042Z 2025-05-07T20:02:34.3787784Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3788862Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3789299Z ^ 2025-05-07T20:02:34.3789554Z detected during: 2025-05-07T20:02:34.3803458Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:34.3830519Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:34.3858235Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:34.3873843Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:34.3874980Z 2025-05-07T20:02:34.3875219Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3875629Z 2025-05-07T20:02:34.3876470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:34.3877591Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:34.3878027Z ^ 2025-05-07T20:02:34.3878306Z detected during: 2025-05-07T20:02:34.3893094Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:34.3920619Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:34.3948393Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:34.3963634Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:34.3964789Z 2025-05-07T20:02:34.3965037Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:34.3965398Z 2025-05-07T20:02:45.1892595Z [151/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o 2025-05-07T20:02:45.1903992Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:45.1918621Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:45.1919779Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:45.1920251Z ^ 2025-05-07T20:02:45.1920439Z 2025-05-07T20:02:45.1920682Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1921074Z 2025-05-07T20:02:45.1921877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:45.1923303Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:45.1923745Z ^ 2025-05-07T20:02:45.1923918Z 2025-05-07T20:02:45.1924886Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1926187Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1926695Z ^ 2025-05-07T20:02:45.1926920Z 2025-05-07T20:02:45.1927874Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1929247Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1929718Z ^ 2025-05-07T20:02:45.1929941Z 2025-05-07T20:02:45.1930817Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1932001Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1932569Z ^ 2025-05-07T20:02:45.1932776Z 2025-05-07T20:02:45.1933052Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1933400Z 2025-05-07T20:02:45.1934328Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1935535Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1935992Z ^ 2025-05-07T20:02:45.1936241Z 2025-05-07T20:02:45.1937098Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1938308Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1938761Z ^ 2025-05-07T20:02:45.1938993Z 2025-05-07T20:02:45.1939224Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1939554Z 2025-05-07T20:02:45.1940421Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1941602Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1942075Z ^ 2025-05-07T20:02:45.1942297Z 2025-05-07T20:02:45.1943147Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1944357Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1944822Z ^ 2025-05-07T20:02:45.1945024Z 2025-05-07T20:02:45.1945251Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1945599Z 2025-05-07T20:02:45.1946445Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1947637Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1948089Z ^ 2025-05-07T20:02:45.1948329Z 2025-05-07T20:02:45.1949196Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1950403Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1950855Z ^ 2025-05-07T20:02:45.1951057Z 2025-05-07T20:02:45.1951306Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1951633Z 2025-05-07T20:02:45.1952483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1953710Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1954209Z ^ 2025-05-07T20:02:45.1954436Z 2025-05-07T20:02:45.1956343Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1957588Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1958067Z ^ 2025-05-07T20:02:45.1958275Z 2025-05-07T20:02:45.1958503Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1958854Z 2025-05-07T20:02:45.1959708Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1960916Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1961361Z ^ 2025-05-07T20:02:45.1961581Z 2025-05-07T20:02:45.1962464Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1963947Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1964447Z ^ 2025-05-07T20:02:45.1964670Z 2025-05-07T20:02:45.1964943Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:45.1965303Z 2025-05-07T20:02:45.1966236Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:45.1967722Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:45.1968229Z ^ 2025-05-07T20:02:45.1968470Z 2025-05-07T20:03:22.8298491Z [152/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o 2025-05-07T20:03:22.8309936Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:22.8311393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:22.8312495Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:22.8312949Z ^ 2025-05-07T20:03:22.8313120Z 2025-05-07T20:03:22.8313356Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8313719Z 2025-05-07T20:03:22.8314474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:22.8315559Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:22.8316014Z ^ 2025-05-07T20:03:22.8316181Z 2025-05-07T20:03:22.8317085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8318291Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8318791Z ^ 2025-05-07T20:03:22.8319016Z 2025-05-07T20:03:22.8319898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8321104Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8321602Z ^ 2025-05-07T20:03:22.8321830Z 2025-05-07T20:03:22.8322815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8324320Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8324840Z ^ 2025-05-07T20:03:22.8325069Z 2025-05-07T20:03:22.8325388Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8325776Z 2025-05-07T20:03:22.8326744Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8328124Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8328623Z ^ 2025-05-07T20:03:22.8328893Z 2025-05-07T20:03:22.8329898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8331129Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8331587Z ^ 2025-05-07T20:03:22.8331800Z 2025-05-07T20:03:22.8332064Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8332399Z 2025-05-07T20:03:22.8333264Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8334493Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8334990Z ^ 2025-05-07T20:03:22.8335217Z 2025-05-07T20:03:22.8336076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8337303Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8337782Z ^ 2025-05-07T20:03:22.8337993Z 2025-05-07T20:03:22.8338227Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8338583Z 2025-05-07T20:03:22.8339432Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8340645Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8341104Z ^ 2025-05-07T20:03:22.8341327Z 2025-05-07T20:03:22.8342207Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8343403Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8343882Z ^ 2025-05-07T20:03:22.8344093Z 2025-05-07T20:03:22.8344347Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8344682Z 2025-05-07T20:03:22.8345534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8346741Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8347274Z ^ 2025-05-07T20:03:22.8347499Z 2025-05-07T20:03:22.8348471Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8349699Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8350156Z ^ 2025-05-07T20:03:22.8350393Z 2025-05-07T20:03:22.8350625Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8350963Z 2025-05-07T20:03:22.8351845Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8353033Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8353528Z ^ 2025-05-07T20:03:22.8353755Z 2025-05-07T20:03:22.8354644Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8355842Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8356317Z ^ 2025-05-07T20:03:22.8356528Z 2025-05-07T20:03:22.8356779Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:22.8357112Z 2025-05-07T20:03:22.8357964Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:22.8359175Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:22.8359634Z ^ 2025-05-07T20:03:22.8359883Z 2025-05-07T20:03:23.8899751Z [153/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o 2025-05-07T20:03:23.8911308Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:23.8912805Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:23.8913887Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:23.8914348Z ^ 2025-05-07T20:03:23.8914518Z 2025-05-07T20:03:23.8914755Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8915123Z 2025-05-07T20:03:23.8915876Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:23.8916984Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:23.8917407Z ^ 2025-05-07T20:03:23.8917604Z 2025-05-07T20:03:23.8918475Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8919710Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8920166Z ^ 2025-05-07T20:03:23.8920380Z 2025-05-07T20:03:23.8921257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8922449Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8923277Z ^ 2025-05-07T20:03:23.8923520Z 2025-05-07T20:03:23.8924488Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8925785Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8926306Z ^ 2025-05-07T20:03:23.8926533Z 2025-05-07T20:03:23.8926809Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8927167Z 2025-05-07T20:03:23.8928170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8929661Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8930153Z ^ 2025-05-07T20:03:23.8930406Z 2025-05-07T20:03:23.8931267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8932494Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8932960Z ^ 2025-05-07T20:03:23.8933203Z 2025-05-07T20:03:23.8933436Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8933773Z 2025-05-07T20:03:23.8934660Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8935855Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8936345Z ^ 2025-05-07T20:03:23.8936567Z 2025-05-07T20:03:23.8937458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8938652Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8939134Z ^ 2025-05-07T20:03:23.8939344Z 2025-05-07T20:03:23.8939577Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8939940Z 2025-05-07T20:03:23.8940796Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8942007Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8942472Z ^ 2025-05-07T20:03:23.8942724Z 2025-05-07T20:03:23.8943595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8944831Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8945291Z ^ 2025-05-07T20:03:23.8945528Z 2025-05-07T20:03:23.8945761Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8946097Z 2025-05-07T20:03:23.8946976Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8948165Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8948661Z ^ 2025-05-07T20:03:23.8948926Z 2025-05-07T20:03:23.8949793Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8951110Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8951595Z ^ 2025-05-07T20:03:23.8951807Z 2025-05-07T20:03:23.8952041Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8952405Z 2025-05-07T20:03:23.8953260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8954480Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8954946Z ^ 2025-05-07T20:03:23.8955203Z 2025-05-07T20:03:23.8956075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8957311Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8957768Z ^ 2025-05-07T20:03:23.8957982Z 2025-05-07T20:03:23.8958240Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:23.8958576Z 2025-05-07T20:03:23.8959427Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:23.8960647Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:23.8961153Z ^ 2025-05-07T20:03:23.8961381Z 2025-05-07T20:03:24.5186488Z [154/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_gen_ai.so -o experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -ldl && : 2025-05-07T20:03:24.7847766Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:24.7849410Z ################################################################################ 2025-05-07T20:03:24.7849800Z [CMAKE] Running post-build script ... 2025-05-07T20:03:24.7850553Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:24.7851267Z Removing all RPATHs ... 2025-05-07T20:03:24.7851701Z ################################################################################ 2025-05-07T20:03:24.7852779Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-build && /github/home/miniconda/envs/build_binary/lib/python3.11/site-packages/cmake/data/bin/cmake -P cmake_install.cmake 2025-05-07T20:03:24.8728260Z -- Install configuration: "Release" 2025-05-07T20:03:24.8766703Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/asmjit.so 2025-05-07T20:03:24.8827402Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/fbgemm.so 2025-05-07T20:03:24.8871062Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:24.8887723Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench 2025-05-07T20:03:24.8907080Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/__init__.py 2025-05-07T20:03:24.8910367Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py 2025-05-07T20:03:24.8913046Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py 2025-05-07T20:03:24.8914400Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py 2025-05-07T20:03:24.8918899Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py 2025-05-07T20:03:24.8926885Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py 2025-05-07T20:03:24.8962015Z -- Up-to-date: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:24.8968294Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:24.8990261Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md 2025-05-07T20:03:24.8992803Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py 2025-05-07T20:03:24.8994000Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py 2025-05-07T20:03:24.8995097Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py 2025-05-07T20:03:24.8996144Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py 2025-05-07T20:03:24.8997183Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py 2025-05-07T20:03:24.9003916Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py 2025-05-07T20:03:24.9007795Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py 2025-05-07T20:03:24.9033812Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:24.9062045Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/example/__init__.py 2025-05-07T20:03:24.9063106Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/example/utils.py 2025-05-07T20:03:24.9114723Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T20:03:24.9118019Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T20:03:24.9121299Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T20:03:24.9122977Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T20:03:24.9124280Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T20:03:24.9397509Z 2025-05-07T20:03:25.2854721Z 2025-05-07T20:03:25.2877651Z copying fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/__init__.py 2025-05-07T20:03:25.3040191Z copying fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py 2025-05-07T20:03:25.3043092Z copying fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/enums.py 2025-05-07T20:03:25.3045571Z copying fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/metrics.py 2025-05-07T20:03:25.3052750Z copying fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py 2025-05-07T20:03:25.3082481Z copying fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py 2025-05-07T20:03:25.3085881Z copying fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize_comm.py 2025-05-07T20:03:25.3088318Z copying fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize_utils.py 2025-05-07T20:03:25.3091111Z copying fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/runtime_monitor.py 2025-05-07T20:03:25.3101751Z copying fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sparse_ops.py 2025-05-07T20:03:25.3107870Z copying fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_configs.py 2025-05-07T20:03:25.3110878Z copying fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py 2025-05-07T20:03:25.3120993Z copying fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py 2025-05-07T20:03:25.3122775Z copying fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_utils.py 2025-05-07T20:03:25.3136749Z copying fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py 2025-05-07T20:03:25.3140413Z copying fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py 2025-05-07T20:03:25.3152961Z copying fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py 2025-05-07T20:03:25.3158934Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py 2025-05-07T20:03:25.3179236Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py 2025-05-07T20:03:25.3185397Z copying fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py 2025-05-07T20:03:25.3190958Z copying fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py 2025-05-07T20:03:25.3198529Z copying fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/uvm.py 2025-05-07T20:03:25.3206971Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config 2025-05-07T20:03:25.3245763Z copying fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config/__init__.py 2025-05-07T20:03:25.3251495Z copying fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config/feature_list.py 2025-05-07T20:03:25.3260891Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs 2025-05-07T20:03:25.3277660Z copying fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/__init__.py 2025-05-07T20:03:25.3284611Z copying fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/common.py 2025-05-07T20:03:25.3291494Z copying fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/examples.py 2025-05-07T20:03:25.3297366Z copying fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py 2025-05-07T20:03:25.3303675Z copying fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py 2025-05-07T20:03:25.3309692Z copying fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py 2025-05-07T20:03:25.3314660Z copying fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/quantize_ops.py 2025-05-07T20:03:25.3321285Z copying fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/sparse_ops.py 2025-05-07T20:03:25.3355429Z copying fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/version.py 2025-05-07T20:03:25.3360647Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize 2025-05-07T20:03:25.3361445Z copying fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize/__init__.py 2025-05-07T20:03:25.3365354Z copying fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize/quantize_ops.py 2025-05-07T20:03:25.3370860Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll 2025-05-07T20:03:25.3390196Z copying fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/__init__.py 2025-05-07T20:03:25.3392278Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe 2025-05-07T20:03:25.3394315Z copying fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/__init__.py 2025-05-07T20:03:25.3399174Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton 2025-05-07T20:03:25.3400903Z copying fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/__init__.py 2025-05-07T20:03:25.3405603Z copying fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/common.py 2025-05-07T20:03:25.3412547Z copying fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/quantize.py 2025-05-07T20:03:25.3418974Z copying fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/quantize_ref.py 2025-05-07T20:03:25.3428782Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils 2025-05-07T20:03:25.3429540Z copying fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/__init__.py 2025-05-07T20:03:25.3433403Z copying fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/loader.py 2025-05-07T20:03:25.3439977Z copying fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/torch_library.py 2025-05-07T20:03:25.3446285Z copying fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/filestore.py 2025-05-07T20:03:25.3451578Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu 2025-05-07T20:03:25.3452333Z copying fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu/__init__.py 2025-05-07T20:03:25.3456694Z copying fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py 2025-05-07T20:03:25.3466912Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta 2025-05-07T20:03:25.3469433Z copying fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta/__init__.py 2025-05-07T20:03:25.3471754Z copying fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py 2025-05-07T20:03:25.3478237Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton 2025-05-07T20:03:25.3480490Z copying fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/__init__.py 2025-05-07T20:03:25.3485611Z copying fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/common.py 2025-05-07T20:03:25.3490960Z copying fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py 2025-05-07T20:03:25.3500977Z copying fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py 2025-05-07T20:03:25.3504924Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py 2025-05-07T20:03:25.3516357Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py 2025-05-07T20:03:25.3528592Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py 2025-05-07T20:03:25.3534771Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py 2025-05-07T20:03:25.3544998Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py 2025-05-07T20:03:25.3554086Z copying fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py 2025-05-07T20:03:25.3563372Z copying fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py 2025-05-07T20:03:25.3568487Z copying fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py 2025-05-07T20:03:25.3579612Z copying fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py 2025-05-07T20:03:25.3601875Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench 2025-05-07T20:03:25.3603680Z copying fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/__init__.py 2025-05-07T20:03:25.3605152Z copying fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py 2025-05-07T20:03:25.3611536Z copying fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py 2025-05-07T20:03:25.3613788Z copying fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py 2025-05-07T20:03:25.3618549Z copying fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py 2025-05-07T20:03:25.3623098Z copying fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py 2025-05-07T20:03:25.3627054Z copying fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/reporter.py 2025-05-07T20:03:25.3631362Z copying fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py 2025-05-07T20:03:25.3636749Z copying fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py 2025-05-07T20:03:25.3641004Z copying fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/utils.py 2025-05-07T20:03:25.3645323Z copying fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py 2025-05-07T20:03:25.3652690Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache 2025-05-07T20:03:25.3655192Z copying fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache/__init__.py 2025-05-07T20:03:25.3659289Z copying fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py 2025-05-07T20:03:25.3663082Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:25.3663854Z copying fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py 2025-05-07T20:03:25.3671078Z copying fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/common.py 2025-05-07T20:03:25.3677462Z copying fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/inference.py 2025-05-07T20:03:25.3685330Z copying fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/training.py 2025-05-07T20:03:25.3693721Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils 2025-05-07T20:03:25.3694516Z copying fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/__init__.py 2025-05-07T20:03:25.3700453Z copying fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/common.py 2025-05-07T20:03:25.3705896Z copying fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/offsets.py 2025-05-07T20:03:25.3710381Z copying fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/quantize.py 2025-05-07T20:03:25.3715789Z copying fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/requests.py 2025-05-07T20:03:25.3723915Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats 2025-05-07T20:03:25.3726175Z copying fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats/__init__.py 2025-05-07T20:03:25.3731004Z copying fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py 2025-05-07T20:03:25.3735066Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:25.3735902Z copying fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py 2025-05-07T20:03:25.3740009Z copying fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py 2025-05-07T20:03:25.3744179Z creating directory _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged 2025-05-07T20:03:25.3745038Z copying fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged/__init__.py 2025-05-07T20:03:25.3749212Z copying fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py 2025-05-07T20:03:25.3860415Z 2025-05-07T20:03:26.1030683Z INFO:root:running bdist_wheel 2025-05-07T20:03:26.2606238Z INFO:root:running build 2025-05-07T20:03:26.2624560Z INFO:root:running build_py 2025-05-07T20:03:26.2999735Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3035063Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3045080Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3048403Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3062918Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3064337Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3067914Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3070647Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3083431Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3086904Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3097584Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3100361Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3101783Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3103260Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3104634Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3110538Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3114577Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3118251Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3120366Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3148895Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3150485Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3151890Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3153361Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.3162909Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config 2025-05-07T20:03:26.3166222Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config 2025-05-07T20:03:26.3170039Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config 2025-05-07T20:03:26.3176203Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3177523Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3183835Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3185418Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3186905Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3188448Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3198934Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3200415Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3201820Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3203711Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.3205958Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize 2025-05-07T20:03:26.3207164Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize 2025-05-07T20:03:26.3209233Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize 2025-05-07T20:03:26.3211156Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll 2025-05-07T20:03:26.3212320Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll 2025-05-07T20:03:26.3214287Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe 2025-05-07T20:03:26.3222144Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe 2025-05-07T20:03:26.3227307Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.3229222Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.3231416Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.3233014Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.3234571Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.3240750Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.3241944Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.3243697Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.3245246Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.3254570Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.3258003Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu 2025-05-07T20:03:26.3259163Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu 2025-05-07T20:03:26.3260721Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu 2025-05-07T20:03:26.3272618Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta 2025-05-07T20:03:26.3276072Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta 2025-05-07T20:03:26.3278779Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta 2025-05-07T20:03:26.3281243Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3282489Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3284209Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3285865Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3287603Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3289308Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3291067Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3293126Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3294936Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3296796Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3298593Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3300357Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3302123Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3303795Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.3323973Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3327617Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3329378Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3330851Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3337978Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3340472Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3344584Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3353760Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3358151Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3364829Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3368396Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3370783Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.3372159Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache 2025-05-07T20:03:26.3373299Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache 2025-05-07T20:03:26.3375014Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache 2025-05-07T20:03:26.3376525Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.3377633Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.3380752Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.3396300Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.3397760Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.3401260Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.3402400Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.3404152Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.3405580Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.3407026Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.3408488Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.3410107Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats 2025-05-07T20:03:26.3411213Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats 2025-05-07T20:03:26.3412686Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats 2025-05-07T20:03:26.3414477Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:26.3416226Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:26.3439869Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:26.3441417Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged 2025-05-07T20:03:26.3442820Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged 2025-05-07T20:03:26.3444440Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged 2025-05-07T20:03:26.4075459Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.4119487Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.4417555Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:26.4420544Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:26.9047687Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9049069Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9053253Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9058848Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9072236Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9080238Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9094598Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench 2025-05-07T20:03:26.9111377Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9112880Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9114856Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9118924Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9126524Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9130871Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9170209Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:26.9176730Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:26.9182077Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:26.9187003Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example 2025-05-07T20:03:26.9188365Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example 2025-05-07T20:03:26.9228306Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example 2025-05-07T20:03:26.9231881Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example 2025-05-07T20:03:26.9233170Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:26.9234527Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:26.9243202Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:26.9264355Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:26.9278363Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:26.9287548Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:26.9291806Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9310118Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9313354Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9314571Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9316183Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9317731Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9319099Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9320367Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9321645Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9323224Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9324579Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9326011Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9327471Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9328863Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9330280Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9331775Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9333311Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9334841Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9351829Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9353412Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9354772Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9356032Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu 2025-05-07T20:03:26.9357269Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config 2025-05-07T20:03:26.9358754Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config 2025-05-07T20:03:26.9360204Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9361496Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9362919Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9364480Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9365922Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9367611Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9369039Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9370409Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9371756Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs 2025-05-07T20:03:26.9373124Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize 2025-05-07T20:03:26.9376509Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize 2025-05-07T20:03:26.9377889Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll 2025-05-07T20:03:26.9379301Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe 2025-05-07T20:03:26.9380749Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.9382593Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.9401881Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.9403486Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton 2025-05-07T20:03:26.9404872Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.9406230Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.9407803Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.9409488Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils 2025-05-07T20:03:26.9410844Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu 2025-05-07T20:03:26.9412185Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu 2025-05-07T20:03:26.9413501Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta 2025-05-07T20:03:26.9414871Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta 2025-05-07T20:03:26.9416277Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9417665Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9419184Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9420791Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9422328Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9423870Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9425486Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9427176Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9428844Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9430489Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9432162Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9433749Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9448915Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton 2025-05-07T20:03:26.9450641Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9452088Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9453533Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9454977Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9456580Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9458078Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9460826Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9462451Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9463969Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9465422Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9466825Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench 2025-05-07T20:03:26.9468580Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache 2025-05-07T20:03:26.9470166Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache 2025-05-07T20:03:26.9471637Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.9473481Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.9474952Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.9476414Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:26.9478693Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.9480344Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.9482064Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.9483568Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.9485013Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils 2025-05-07T20:03:26.9486419Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats 2025-05-07T20:03:26.9487947Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats 2025-05-07T20:03:26.9489455Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:26.9491477Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:26.9493082Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged 2025-05-07T20:03:26.9494663Z INFO:root:copying _skbuild/linux-x86_64-3.11/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged 2025-05-07T20:03:26.9565142Z INFO:skbuild:copied 90 files 2025-05-07T20:03:26.9566040Z INFO:root:running build_ext 2025-05-07T20:03:27.0059492Z INFO:root:installing to _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:27.0060903Z INFO:root:running install 2025-05-07T20:03:27.0519297Z INFO:root:running install_lib 2025-05-07T20:03:27.0560615Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:27.0568606Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu 2025-05-07T20:03:27.0586455Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/config 2025-05-07T20:03:27.0589988Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:27.0592728Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:27.0593976Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/docs 2025-05-07T20:03:27.0598473Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0600036Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0601609Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0603587Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0605326Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0607050Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0608691Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0610286Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0611860Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:27.0613018Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/quantize 2025-05-07T20:03:27.0614244Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:27.0615891Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:27.0617078Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll 2025-05-07T20:03:27.0617874Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/cpu 2025-05-07T20:03:27.0619079Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:27.0620644Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:27.0621847Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/meta 2025-05-07T20:03:27.0623059Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:27.0624647Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:27.0625877Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0627090Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0628890Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0630721Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0633093Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0634876Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0636672Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0638525Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0640477Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0642388Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0644333Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0646213Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0648040Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0649874Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:27.0651553Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll 2025-05-07T20:03:27.0652644Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe 2025-05-07T20:03:27.0653410Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0654605Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0656197Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0657839Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0659471Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0661245Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0663004Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0664670Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0666324Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0668260Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0669936Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0671562Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:27.0672771Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/cache 2025-05-07T20:03:27.0673965Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:27.0675624Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:27.0676891Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:27.0677690Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:27.0678921Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:27.0680698Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:27.0682428Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:27.0684032Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:27.0685623Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:27.0687222Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:27.0688524Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/utils 2025-05-07T20:03:27.0689785Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:27.0691396Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:27.0692993Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:27.0694610Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:27.0696243Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:27.0697427Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/stats 2025-05-07T20:03:27.0698620Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:27.0700266Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:27.0701850Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe 2025-05-07T20:03:27.0702964Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton 2025-05-07T20:03:27.0703758Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton/jagged 2025-05-07T20:03:27.0704979Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:27.0706768Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:27.0708500Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:27.0710057Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:27.0711649Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:27.0713272Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:27.0714443Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/utils 2025-05-07T20:03:27.0715643Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:27.0717286Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:27.0718855Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:27.0720467Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:27.0721992Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.0723490Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.0730280Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental 2025-05-07T20:03:27.0731204Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:27.0732650Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:27.1316655Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1318336Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1320445Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1322551Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1324661Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1326572Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1328483Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:27.1330342Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:27.1332159Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:27.1333702Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1335275Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1337109Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1338939Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1340829Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1342741Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1344580Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:27.1345974Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/example 2025-05-07T20:03:27.1347487Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:27.1349436Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:27.1351287Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:27.1352644Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm 2025-05-07T20:03:27.1353560Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:27.1355047Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:27.1357030Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:27.1359017Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:27.1361073Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:27.1363207Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:27.1365014Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1366536Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1368257Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1369711Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1371263Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1372969Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1374589Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1376072Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1377592Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1379105Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1380622Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1382278Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1383958Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1385546Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1387206Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1388913Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1390651Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1392567Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1394416Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1396153Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1397807Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1399310Z INFO:root:copying _skbuild/linux-x86_64-3.11/setuptools/lib.linux-x86_64-cpython-311/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:27.1400188Z INFO:skbuild:copied 115 files 2025-05-07T20:03:27.1400539Z INFO:root:running install_egg_info 2025-05-07T20:03:27.1859117Z INFO:root:running egg_info 2025-05-07T20:03:27.1909157Z INFO:root:creating fbgemm_gpu_genai_nightly.egg-info 2025-05-07T20:03:27.1918059Z INFO:root:writing fbgemm_gpu_genai_nightly.egg-info/PKG-INFO 2025-05-07T20:03:27.2164847Z INFO:root:writing dependency_links to fbgemm_gpu_genai_nightly.egg-info/dependency_links.txt 2025-05-07T20:03:27.2184281Z INFO:root:writing requirements to fbgemm_gpu_genai_nightly.egg-info/requires.txt 2025-05-07T20:03:27.2186258Z INFO:root:writing top-level names to fbgemm_gpu_genai_nightly.egg-info/top_level.txt 2025-05-07T20:03:27.2223198Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:27.2434411Z INFO:root:reading manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:27.2507639Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:27.2525476Z INFO:root:Copying fbgemm_gpu_genai_nightly.egg-info to _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu_genai_nightly-2025.5.7-py3.11.egg-info 2025-05-07T20:03:27.2565050Z INFO:root:running install_scripts 2025-05-07T20:03:27.2566007Z INFO:skbuild:copied 0 files 2025-05-07T20:03:35.5090301Z INFO:root:creating _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL 2025-05-07T20:03:35.5395609Z INFO:wheel:creating '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-sjma5294/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl' and adding '_skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel' to it 2025-05-07T20:03:35.5595186Z INFO:wheel:adding 'fbgemm_gpu/__init__.py' 2025-05-07T20:03:35.6191689Z INFO:wheel:adding 'fbgemm_gpu/asmjit.so' 2025-05-07T20:03:35.6200528Z INFO:wheel:adding 'fbgemm_gpu/batched_unary_embeddings_ops.py' 2025-05-07T20:03:35.6201426Z INFO:wheel:adding 'fbgemm_gpu/enums.py' 2025-05-07T20:03:35.8223847Z INFO:wheel:adding 'fbgemm_gpu/fbgemm.so' 2025-05-07T20:03:35.8345452Z INFO:wheel:adding 'fbgemm_gpu/metrics.py' 2025-05-07T20:03:35.8346809Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules.py' 2025-05-07T20:03:35.8348367Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules_split.py' 2025-05-07T20:03:35.8349746Z INFO:wheel:adding 'fbgemm_gpu/quantize_comm.py' 2025-05-07T20:03:35.8352007Z INFO:wheel:adding 'fbgemm_gpu/quantize_utils.py' 2025-05-07T20:03:35.8354827Z INFO:wheel:adding 'fbgemm_gpu/runtime_monitor.py' 2025-05-07T20:03:35.8366061Z INFO:wheel:adding 'fbgemm_gpu/sparse_ops.py' 2025-05-07T20:03:35.8369628Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_configs.py' 2025-05-07T20:03:35.8371694Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_inference_converter.py' 2025-05-07T20:03:35.8373241Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_optimizer_ops.py' 2025-05-07T20:03:35.8374509Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_utils.py' 2025-05-07T20:03:35.8384562Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops.py' 2025-05-07T20:03:35.8390760Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_common.py' 2025-05-07T20:03:35.8412110Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_inference.py' 2025-05-07T20:03:35.8453831Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training.py' 2025-05-07T20:03:35.8458109Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py' 2025-05-07T20:03:35.8459901Z INFO:wheel:adding 'fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py' 2025-05-07T20:03:35.8461360Z INFO:wheel:adding 'fbgemm_gpu/tbe_input_multiplexer.py' 2025-05-07T20:03:35.8462263Z INFO:wheel:adding 'fbgemm_gpu/uvm.py' 2025-05-07T20:03:35.8463576Z INFO:wheel:adding 'fbgemm_gpu/config/__init__.py' 2025-05-07T20:03:35.8465163Z INFO:wheel:adding 'fbgemm_gpu/config/feature_list.py' 2025-05-07T20:03:35.8466760Z INFO:wheel:adding 'fbgemm_gpu/docs/__init__.py' 2025-05-07T20:03:35.8468369Z INFO:wheel:adding 'fbgemm_gpu/docs/common.py' 2025-05-07T20:03:35.8470179Z INFO:wheel:adding 'fbgemm_gpu/docs/examples.py' 2025-05-07T20:03:35.8472286Z INFO:wheel:adding 'fbgemm_gpu/docs/jagged_tensor_ops.py' 2025-05-07T20:03:35.8473808Z INFO:wheel:adding 'fbgemm_gpu/docs/merge_pooled_embedding_ops.py' 2025-05-07T20:03:35.8475849Z INFO:wheel:adding 'fbgemm_gpu/docs/permute_pooled_embedding_ops.py' 2025-05-07T20:03:35.8477346Z INFO:wheel:adding 'fbgemm_gpu/docs/quantize_ops.py' 2025-05-07T20:03:35.8483493Z INFO:wheel:adding 'fbgemm_gpu/docs/sparse_ops.py' 2025-05-07T20:03:35.8485027Z INFO:wheel:adding 'fbgemm_gpu/docs/version.py' 2025-05-07T20:03:35.8486807Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/__init__.py' 2025-05-07T20:03:35.8489086Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/ck_bf16_bench.py' 2025-05-07T20:03:35.8491980Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/comm_bench.py' 2025-05-07T20:03:35.8495520Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/gather_scatter_bench.py' 2025-05-07T20:03:35.8500928Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_bench.py' 2025-05-07T20:03:35.8516438Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_ops.py' 2025-05-07T20:03:35.8517893Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/__init__.py' 2025-05-07T20:03:35.8665051Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so' 2025-05-07T20:03:35.8675529Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/utils.py' 2025-05-07T20:03:35.8677098Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py' 2025-05-07T20:03:35.8704172Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py' 2025-05-07T20:03:35.8711916Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py' 2025-05-07T20:03:35.8713845Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py' 2025-05-07T20:03:35.8715898Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/utils.py' 2025-05-07T20:03:35.8717572Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/__init__.py' 2025-05-07T20:03:37.8254483Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so' 2025-05-07T20:03:38.0260399Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/quantize.py' 2025-05-07T20:03:38.0261049Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/README.md' 2025-05-07T20:03:38.0261621Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/__init__.py' 2025-05-07T20:03:38.0262271Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/activation.py' 2025-05-07T20:03:38.0268448Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py' 2025-05-07T20:03:38.0277948Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/layers.py' 2025-05-07T20:03:38.0280756Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/shuffling.py' 2025-05-07T20:03:38.0283049Z INFO:wheel:adding 'fbgemm_gpu/quantize/__init__.py' 2025-05-07T20:03:38.0284707Z INFO:wheel:adding 'fbgemm_gpu/quantize/quantize_ops.py' 2025-05-07T20:03:38.0286551Z INFO:wheel:adding 'fbgemm_gpu/sll/__init__.py' 2025-05-07T20:03:38.0288231Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/__init__.py' 2025-05-07T20:03:38.0294564Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/cpu_sll.py' 2025-05-07T20:03:38.0296648Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/__init__.py' 2025-05-07T20:03:38.0299045Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/meta_sll.py' 2025-05-07T20:03:38.0311113Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/__init__.py' 2025-05-07T20:03:38.0311699Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/common.py' 2025-05-07T20:03:38.0312202Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py' 2025-05-07T20:03:38.0312803Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py' 2025-05-07T20:03:38.0313319Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm.py' 2025-05-07T20:03:38.0313850Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py' 2025-05-07T20:03:38.0315779Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py' 2025-05-07T20:03:38.0317741Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py' 2025-05-07T20:03:38.0323904Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py' 2025-05-07T20:03:38.0329074Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py' 2025-05-07T20:03:38.0330433Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py' 2025-05-07T20:03:38.0334123Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_softmax.py' 2025-05-07T20:03:38.0339962Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py' 2025-05-07T20:03:38.0341488Z INFO:wheel:adding 'fbgemm_gpu/tbe/__init__.py' 2025-05-07T20:03:38.0342830Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/__init__.py' 2025-05-07T20:03:38.0344794Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_config.py' 2025-05-07T20:03:38.0349986Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_runs.py' 2025-05-07T20:03:38.0352031Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eeg_cli.py' 2025-05-07T20:03:38.0354351Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/embedding_ops_common_config.py' 2025-05-07T20:03:38.0355974Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eval_compression.py' 2025-05-07T20:03:38.0357294Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/reporter.py' 2025-05-07T20:03:38.0360184Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config.py' 2025-05-07T20:03:38.0362763Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_loader.py' 2025-05-07T20:03:38.0365200Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py' 2025-05-07T20:03:38.0366728Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/utils.py' 2025-05-07T20:03:38.0368515Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/__init__.py' 2025-05-07T20:03:38.0369983Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py' 2025-05-07T20:03:38.0371385Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/__init__.py' 2025-05-07T20:03:38.0372570Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/common.py' 2025-05-07T20:03:38.0378359Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/inference.py' 2025-05-07T20:03:38.0405546Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/training.py' 2025-05-07T20:03:38.0406832Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/__init__.py' 2025-05-07T20:03:38.0409672Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py' 2025-05-07T20:03:38.0411164Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/__init__.py' 2025-05-07T20:03:38.0413316Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/bench_params_reporter.py' 2025-05-07T20:03:38.0414903Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/__init__.py' 2025-05-07T20:03:38.0416620Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/common.py' 2025-05-07T20:03:38.0417697Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/offsets.py' 2025-05-07T20:03:38.0420007Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/quantize.py' 2025-05-07T20:03:38.0425483Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/requests.py' 2025-05-07T20:03:38.0428453Z INFO:wheel:adding 'fbgemm_gpu/triton/__init__.py' 2025-05-07T20:03:38.0429560Z INFO:wheel:adding 'fbgemm_gpu/triton/common.py' 2025-05-07T20:03:38.0436813Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize.py' 2025-05-07T20:03:38.0440835Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize_ref.py' 2025-05-07T20:03:38.0442550Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/__init__.py' 2025-05-07T20:03:38.0451192Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py' 2025-05-07T20:03:38.0452508Z INFO:wheel:adding 'fbgemm_gpu/utils/__init__.py' 2025-05-07T20:03:38.0454632Z INFO:wheel:adding 'fbgemm_gpu/utils/filestore.py' 2025-05-07T20:03:38.0456126Z INFO:wheel:adding 'fbgemm_gpu/utils/loader.py' 2025-05-07T20:03:38.0458151Z INFO:wheel:adding 'fbgemm_gpu/utils/torch_library.py' 2025-05-07T20:03:38.0461424Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/METADATA' 2025-05-07T20:03:38.0462994Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL' 2025-05-07T20:03:38.0464602Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/top_level.txt' 2025-05-07T20:03:38.0485925Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/RECORD' 2025-05-07T20:03:38.0491356Z INFO:root:removing _skbuild/linux-x86_64-3.11/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:38.1638134Z ╒════════════════════════════╤════════════════════════════════════════════════╕ 2025-05-07T20:03:38.1638843Z │ │ Version │ 2025-05-07T20:03:38.1639368Z ╞════════════════════════════╪════════════════════════════════════════════════╡ 2025-05-07T20:03:38.1639862Z │ PyTorch │ 2.8.0.dev20250507+cu128 │ 2025-05-07T20:03:38.1640386Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:38.1640915Z │ CUDA (Declared by PyTorch) │ 12.8 │ 2025-05-07T20:03:38.1641465Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:38.1641977Z │ CUDA (Actual) │ nvcc: NVIDIA (R) Cuda compiler driver │ 2025-05-07T20:03:38.1642474Z │ │ Copyright (c) 2005-2025 NVIDIA Corporation │ 2025-05-07T20:03:38.1643284Z │ │ Built on Wed_Jan_15_19:20:09_PST_2025 │ 2025-05-07T20:03:38.1643787Z │ │ Cuda compilation tools, release 12.8, V12.8.61 │ 2025-05-07T20:03:38.1644274Z │ │ Build cuda_12.8.r12.8/compiler.35404655_0 │ 2025-05-07T20:03:38.1644821Z ╘════════════════════════════╧════════════════════════════════════════════════╛ 2025-05-07T20:03:47.0405200Z Successfully built fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.4904251Z 2025-05-07T20:03:51.6216240Z ################################################################################ 2025-05-07T20:03:51.6217335Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.6218029Z [CHECK] Listing out library size: 2025-05-07T20:03:51.6247029Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.6247601Z 2025-05-07T20:03:51.6399701Z 91 ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.6401877Z 2025-05-07T20:03:51.6440851Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.6444395Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:51.6445177Z 2025-05-07T20:03:51.7738128Z GLIBC_2.2.5 2025-05-07T20:03:51.7738765Z GLIBC_2.3 2025-05-07T20:03:51.7739743Z GLIBC_2.14 2025-05-07T20:03:51.7740074Z 2025-05-07T20:03:51.7740079Z 2025-05-07T20:03:51.7740786Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.7742128Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:51.7742840Z 2025-05-07T20:03:51.7964725Z GLIBCXX_3.4 2025-05-07T20:03:51.7965364Z GLIBCXX_3.4.9 2025-05-07T20:03:51.7965950Z GLIBCXX_3.4.11 2025-05-07T20:03:51.7966555Z GLIBCXX_3.4.18 2025-05-07T20:03:51.7967567Z GLIBCXX_3.4.21 2025-05-07T20:03:51.7968159Z GLIBCXX_3.4.29 2025-05-07T20:03:51.7968536Z 2025-05-07T20:03:51.7968549Z 2025-05-07T20:03:51.8495856Z + nm -gDC ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.m6Htx6qHvS.symbols.txt 2025-05-07T20:03:51.8496460Z 2025-05-07T20:03:51.8718689Z 2025-05-07T20:03:51.9018428Z [CHECK] Total Number of symbols: 2736 2025-05-07T20:03:51.9044544Z [CHECK] Number of fbgemm symbols: 676 2025-05-07T20:03:51.9064905Z + nm -gDCu ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.psewxnvRVC.usymbols.txt 2025-05-07T20:03:51.9066790Z 2025-05-07T20:03:51.9090771Z 2025-05-07T20:03:51.9123422Z [CHECK] Listing out undefined symbols (249 total): 2025-05-07T20:03:51.9150736Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:51.9152707Z U VTT for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:51.9153277Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:51.9153627Z U __assert_fail@GLIBC_2.2.5 2025-05-07T20:03:51.9153983Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:51.9154418Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:51.9154816Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:51.9155222Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:51.9155616Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:51.9155980Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:51.9156364Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:51.9156735Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:51.9157083Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:51.9157405Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:51.9157745Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:51.9158299Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:51.9158633Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:51.9158989Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:51.9159401Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:51.9159725Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:51.9160063Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:51.9160388Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:51.9160723Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:51.9161062Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:51.9161357Z U __udivti3@GCC_3.0 2025-05-07T20:03:51.9161661Z U __xstat@GLIBC_2.2.5 2025-05-07T20:03:51.9161990Z U at::CUDAGeneratorImpl::device_type() 2025-05-07T20:03:51.9162490Z U at::CUDAGeneratorImpl::philox_cuda_state(unsigned long) 2025-05-07T20:03:51.9163019Z U at::TensorMaker::make_tensor() 2025-05-07T20:03:51.9163559Z U at::_ops::add__Tensor::call(at::Tensor&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:51.9164135Z U at::_ops::div__Scalar::call(at::Tensor&, c10::Scalar const&) 2025-05-07T20:03:51.9165015Z U at::_ops::empty_like::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:51.9166390Z U at::_ops::empty_memory_format::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:51.9167597Z U at::_ops::expand::call(at::Tensor const&, c10::ArrayRef, bool) 2025-05-07T20:03:51.9168139Z U at::_ops::index_select::call(at::Tensor const&, long, at::Tensor const&) 2025-05-07T20:03:51.9168664Z U at::_ops::norm_Scalar::call(at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:51.9169207Z U at::_ops::scatter_add_::call(at::Tensor&, long, at::Tensor const&, at::Tensor const&) 2025-05-07T20:03:51.9169772Z U at::_ops::select_int::call(at::Tensor const&, long, c10::SymInt) 2025-05-07T20:03:51.9170318Z U at::_ops::split_sizes::call(at::Tensor const&, c10::ArrayRef, long) 2025-05-07T20:03:51.9171027Z U at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef, bool, std::optional) 2025-05-07T20:03:51.9171831Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:51.9173137Z U at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, bool, bool, std::optional) 2025-05-07T20:03:51.9173967Z U at::_ops::unsqueeze::call(at::Tensor const&, long) 2025-05-07T20:03:51.9174399Z U at::_ops::view::call(at::Tensor const&, c10::ArrayRef) 2025-05-07T20:03:51.9175141Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:51.9175845Z U at::cuda::detail::getDefaultCUDAGenerator(signed char) 2025-05-07T20:03:51.9176251Z U at::cuda::getCurrentDeviceProperties() 2025-05-07T20:03:51.9176636Z U at::tensor(c10::ArrayRef, c10::TensorOptions const&) 2025-05-07T20:03:51.9177011Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:51.9177369Z U c10::AutogradMetaInterface::~AutogradMetaInterface() 2025-05-07T20:03:51.9177804Z U c10::BFloat16* at::TensorBase::data_ptr() const 2025-05-07T20:03:51.9178337Z U c10::BFloat16* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:51.9178734Z U c10::BoolType::get() 2025-05-07T20:03:51.9179295Z U c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) 2025-05-07T20:03:51.9179872Z U c10::Error::what() const 2025-05-07T20:03:51.9180293Z U c10::Float8_e4m3fn* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:51.9180741Z U c10::FloatType::get() 2025-05-07T20:03:51.9181050Z U c10::GeneratorImpl::device() const 2025-05-07T20:03:51.9181394Z U c10::IValue::isTensorList() const 2025-05-07T20:03:51.9181744Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:51.9182135Z U c10::IntType::get() 2025-05-07T20:03:51.9182793Z U c10::ListType::get(std::__cxx11::basic_string, std::allocator > const&, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:51.9185106Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:51.9185527Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:51.9185950Z U c10::OptionalType::get(c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:51.9186401Z U c10::ScalarTypeType::get() 2025-05-07T20:03:51.9186774Z U c10::StorageImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:51.9187129Z U c10::StringType::get() 2025-05-07T20:03:51.9187480Z U c10::SymBool::guard_bool(char const*, long) const 2025-05-07T20:03:51.9187858Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:51.9188503Z U c10::SymInt::SymInt(c10::intrusive_ptr >) 2025-05-07T20:03:51.9189145Z U c10::SymInt::guard_int(char const*, long) const 2025-05-07T20:03:51.9189489Z U c10::SymInt::promote_to_negative() 2025-05-07T20:03:51.9189824Z U c10::SymInt::toSymNode() const 2025-05-07T20:03:51.9190172Z U c10::SymbolicShapeMeta::init_is_contiguous() const 2025-05-07T20:03:51.9190845Z U c10::TensorImpl::set_autograd_meta(std::unique_ptr >) 2025-05-07T20:03:51.9191526Z U c10::TensorImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:51.9191877Z U c10::TensorType::get() 2025-05-07T20:03:51.9192202Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:51.9193098Z U c10::Warning::Warning(std::variant, c10::SourceLocation const&, std::__cxx11::basic_string, std::allocator >, bool) 2025-05-07T20:03:51.9194031Z U c10::cuda::CUDACachingAllocator::allocator 2025-05-07T20:03:51.9194395Z U c10::cuda::CUDAStream::stream() const 2025-05-07T20:03:51.9194723Z U c10::cuda::ExchangeDevice(signed char) 2025-05-07T20:03:51.9195066Z U c10::cuda::GetDevice(signed char*) 2025-05-07T20:03:51.9195388Z U c10::cuda::MaybeSetDevice(signed char) 2025-05-07T20:03:51.9195730Z U c10::cuda::SetDevice(signed char) 2025-05-07T20:03:51.9196192Z U c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) 2025-05-07T20:03:51.9196639Z U c10::cuda::current_device() 2025-05-07T20:03:51.9196953Z U c10::cuda::device_count() 2025-05-07T20:03:51.9197284Z U c10::cuda::getCurrentCUDAStream(signed char) 2025-05-07T20:03:51.9197666Z U c10::cuda::getDefaultCUDAStream(signed char) 2025-05-07T20:03:51.9198134Z U c10::cuda::getStreamFromPool(bool, signed char) 2025-05-07T20:03:51.9198529Z U c10::cuda::getStreamFromPool(int, signed char) 2025-05-07T20:03:51.9198935Z U c10::cuda::setCurrentCUDAStream(c10::cuda::CUDAStream) 2025-05-07T20:03:51.9199304Z U c10::cuda::warn_or_error_on_sync() 2025-05-07T20:03:51.9199940Z U c10::detail::ListImpl::ListImpl(std::vector >, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:51.9200963Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:51.9201792Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:51.9202759Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:51.9204044Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:51.9205102Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:51.9205935Z U c10::get_default_dtype() 2025-05-07T20:03:51.9206432Z U c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKeySet) 2025-05-07T20:03:51.9207050Z U c10::impl::ExcludeDispatchKeyGuard::~ExcludeDispatchKeyGuard() 2025-05-07T20:03:51.9207509Z U c10::impl::GPUTrace::gpuTraceState 2025-05-07T20:03:51.9207853Z U c10::impl::GPUTrace::haveState 2025-05-07T20:03:51.9208241Z U c10::impl::cow::is_cow_data_ptr(c10::DataPtr const&) 2025-05-07T20:03:51.9208677Z U c10::impl::cow::materialize_cow_storage(c10::StorageImpl&) 2025-05-07T20:03:51.9209104Z U c10::impl::device_guard_impl_registry 2025-05-07T20:03:51.9209480Z U c10::operator*(c10::SymInt const&, int) 2025-05-07T20:03:51.9209841Z U c10::operator-(c10::SymInt const&, int) 2025-05-07T20:03:51.9210216Z U c10::operator-(c10::SymInt const&, long) 2025-05-07T20:03:51.9210603Z U c10::operator<<(std::ostream&, c10::Device const&) 2025-05-07T20:03:51.9211018Z U c10::operator<<(std::ostream&, c10::DeviceType) 2025-05-07T20:03:51.9211390Z U c10::throwNullDataPtrError() 2025-05-07T20:03:51.9211739Z U c10::warn(c10::Warning const&) 2025-05-07T20:03:51.9212089Z U c10::warnDeprecatedDataPtr() 2025-05-07T20:03:51.9212796Z U c10d::getNcclErrorDetailStr(ncclResult_t, std::optional, std::allocator > >) 2025-05-07T20:03:51.9213575Z U c10d::ncclGetErrorWithVersion[abi:cxx11](ncclResult_t) 2025-05-07T20:03:51.9214056Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:51.9214504Z U caffe2::TypeMeta::typeMetaDatas() 2025-05-07T20:03:51.9214851Z U cublasLtCreate 2025-05-07T20:03:51.9215133Z U cublasLtMatmul 2025-05-07T20:03:51.9215561Z U cublasLtMatmulAlgoGetHeuristic 2025-05-07T20:03:51.9215878Z U cublasLtMatmulDescCreate 2025-05-07T20:03:51.9216206Z U cublasLtMatmulDescSetAttribute 2025-05-07T20:03:51.9216531Z U cublasLtMatmulPreferenceCreate 2025-05-07T20:03:51.9216887Z U cublasLtMatmulPreferenceSetAttribute 2025-05-07T20:03:51.9217222Z U cublasLtMatrixLayoutCreate 2025-05-07T20:03:51.9217569Z U cudaDeviceGetAttribute@libcudart.so.12 2025-05-07T20:03:51.9217964Z U cudaDeviceSynchronize@libcudart.so.12 2025-05-07T20:03:51.9218315Z U cudaEventCreateWithFlags@libcudart.so.12 2025-05-07T20:03:51.9218678Z U cudaEventDestroy@libcudart.so.12 2025-05-07T20:03:51.9219012Z U cudaEventElapsedTime@libcudart.so.12 2025-05-07T20:03:51.9219359Z U cudaEventQuery@libcudart.so.12 2025-05-07T20:03:51.9219678Z U cudaEventRecord@libcudart.so.12 2025-05-07T20:03:51.9220036Z U cudaEventSynchronize@libcudart.so.12 2025-05-07T20:03:51.9220382Z U cudaFree@libcudart.so.12 2025-05-07T20:03:51.9220696Z U cudaFuncSetAttribute@libcudart.so.12 2025-05-07T20:03:51.9221036Z U cudaGetDevice@libcudart.so.12 2025-05-07T20:03:51.9221408Z U cudaGetDeviceProperties_v2@libcudart.so.12 2025-05-07T20:03:51.9221785Z U cudaGetDriverEntryPoint@libcudart.so.12 2025-05-07T20:03:51.9222126Z U cudaGetErrorName@libcudart.so.12 2025-05-07T20:03:51.9222502Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:51.9222860Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:51.9223211Z U cudaIpcGetMemHandle@libcudart.so.12 2025-05-07T20:03:51.9223571Z U cudaIpcOpenMemHandle@libcudart.so.12 2025-05-07T20:03:51.9223928Z U cudaLaunchCooperativeKernel@libcudart.so.12 2025-05-07T20:03:51.9224301Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:51.9224634Z U cudaLaunchKernelExC@libcudart.so.12 2025-05-07T20:03:51.9224972Z U cudaMalloc@libcudart.so.12 2025-05-07T20:03:51.9225279Z U cudaMemcpy@libcudart.so.12 2025-05-07T20:03:51.9225610Z U cudaMemcpyAsync@libcudart.so.12 2025-05-07T20:03:51.9225936Z U cudaMemsetAsync@libcudart.so.12 2025-05-07T20:03:51.9226275Z U cudaStreamQuery@libcudart.so.12 2025-05-07T20:03:51.9226622Z U cudaStreamSynchronize@libcudart.so.12 2025-05-07T20:03:51.9226959Z U cudaStreamWaitEvent@libcudart.so.12 2025-05-07T20:03:51.9227286Z U exit@GLIBC_2.2.5 2025-05-07T20:03:51.9227555Z U fclose@GLIBC_2.2.5 2025-05-07T20:03:51.9227845Z U fflush@GLIBC_2.2.5 2025-05-07T20:03:51.9228165Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:51.9228569Z U float* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:51.9228936Z U fopen@GLIBC_2.2.5 2025-05-07T20:03:51.9229208Z U fprintf@GLIBC_2.2.5 2025-05-07T20:03:51.9229495Z U fread@GLIBC_2.2.5 2025-05-07T20:03:51.9229763Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:51.9230083Z U int* at::TensorBase::data_ptr() const 2025-05-07T20:03:51.9230461Z U int* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:51.9230872Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:51.9231294Z U long* at::TensorBase::data_ptr() const 2025-05-07T20:03:51.9231613Z U memcpy@GLIBC_2.14 2025-05-07T20:03:51.9231902Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:51.9232177Z U memset@GLIBC_2.2.5 2025-05-07T20:03:51.9232466Z U ncclAllGather 2025-05-07T20:03:51.9232739Z U ncclAllReduce 2025-05-07T20:03:51.9232995Z U ncclCommInitRank 2025-05-07T20:03:51.9233283Z U ncclGetUniqueId 2025-05-07T20:03:51.9233555Z U ncclReduceScatter 2025-05-07T20:03:51.9233863Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:51.9234196Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:51.9234528Z U printf@GLIBC_2.2.5 2025-05-07T20:03:51.9234917Z U signed char* at::TensorBase::data_ptr() const 2025-05-07T20:03:51.9235388Z U signed char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:51.9236028Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:51.9236791Z U std::__cxx11::basic_ostringstream, std::allocator >::str() const &@GLIBCXX_3.4.29 2025-05-07T20:03:51.9237619Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:51.9238441Z U std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream() 2025-05-07T20:03:51.9239266Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:51.9239881Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:51.9240257Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:51.9240660Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:51.9241051Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:51.9241444Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:51.9241855Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:51.9242331Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:51.9243565Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:51.9244772Z U std::basic_ostream >& std::endl >(std::basic_ostream >&)@GLIBCXX_3.4 2025-05-07T20:03:51.9245892Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char const*)@GLIBCXX_3.4 2025-05-07T20:03:51.9247115Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, unsigned char const*)@GLIBCXX_3.4 2025-05-07T20:03:51.9247925Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:51.9248261Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:51.9248670Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:51.9249129Z U std::exception::what() const@GLIBCXX_3.4 2025-05-07T20:03:51.9249524Z U std::exception::~exception()@GLIBCXX_3.4 2025-05-07T20:03:51.9249938Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:51.9250319Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:51.9250718Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:51.9251088Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:51.9251530Z U std::logic_error::logic_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:51.9251998Z U std::logic_error::~logic_error()@GLIBCXX_3.4 2025-05-07T20:03:51.9252447Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:51.9253031Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:51.9253637Z U std::ostream& std::ostream::_M_insert(void const*)@GLIBCXX_3.4.9 2025-05-07T20:03:51.9254142Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:51.9254538Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:51.9254936Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:51.9255476Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:51.9256156Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:51.9256805Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:51.9257173Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:51.9257470Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:51.9257764Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:51.9258063Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:51.9258859Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:51.9260029Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:51.9260859Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:51.9261702Z U torch::cuda::nccl::all2all(std::vector >&, std::vector >&, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:51.9262603Z U torch::cuda::nccl::all2all_single_equal_split(at::Tensor&, at::Tensor&, int, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:51.9263356Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:51.9263921Z U typeinfo for c10::Error 2025-05-07T20:03:51.9264239Z U typeinfo for std::exception@GLIBCXX_3.4 2025-05-07T20:03:51.9264602Z U typeinfo for std::logic_error@GLIBCXX_3.4 2025-05-07T20:03:51.9264975Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:51.9265403Z U unsigned char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:51.9265820Z U usleep@GLIBC_2.2.5 2025-05-07T20:03:51.9266150Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:51.9266563Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:51.9266982Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:51.9267679Z U vtable for c10::Error 2025-05-07T20:03:51.9268308Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:51.9269159Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:51.9269652Z U vtable for torch::autograd::AutogradMeta 2025-05-07T20:03:51.9270004Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:51.9270358Z w _ITM_registerTMCloneTable 2025-05-07T20:03:51.9270705Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:51.9271008Z w __gmon_start__ 2025-05-07T20:03:51.9271306Z w __pthread_key_create 2025-05-07T20:03:51.9271617Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:51.9271964Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:51.9272334Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:51.9272939Z + ldd ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.9273382Z 2025-05-07T20:03:51.9319124Z linux-vdso.so.1 (0x00007fff50499000) 2025-05-07T20:03:51.9320042Z libtorch.so => not found 2025-05-07T20:03:51.9320812Z libc10.so => not found 2025-05-07T20:03:51.9321847Z libc10_cuda.so => not found 2025-05-07T20:03:51.9322795Z libnccl.so.2 => not found 2025-05-07T20:03:51.9323571Z libtorch_cpu.so => not found 2025-05-07T20:03:51.9324367Z libtorch_cuda.so => not found 2025-05-07T20:03:51.9325138Z libcudart.so.12 => not found 2025-05-07T20:03:51.9326124Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe4bb19c000) 2025-05-07T20:03:51.9327375Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe4c12a1000) 2025-05-07T20:03:51.9328560Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe4c1273000) 2025-05-07T20:03:51.9329503Z libc.so.6 => /lib64/libc.so.6 (0x00007fe4baf94000) 2025-05-07T20:03:51.9329871Z /lib64/ld-linux-x86-64.so.2 (0x00007fe4c12fd000) 2025-05-07T20:03:51.9330251Z libm.so.6 => /lib64/libm.so.6 (0x00007fe4c1196000) 2025-05-07T20:03:51.9330501Z 2025-05-07T20:03:51.9330614Z [CHECK] Displaying ELF information: 2025-05-07T20:03:51.9331281Z + readelf -d ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:51.9331754Z 2025-05-07T20:03:51.9525213Z 2025-05-07T20:03:51.9526582Z Dynamic section at offset 0x5ae3168 contains 38 entries: 2025-05-07T20:03:51.9529726Z Tag Type Name/Value 2025-05-07T20:03:51.9531141Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:51.9532639Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:51.9533486Z 0x0000000000000001 (NEEDED) Shared library: [libc10_cuda.so] 2025-05-07T20:03:51.9533993Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:51.9534527Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:51.9535020Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:51.9535535Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:51.9536044Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:51.9536554Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:51.9537059Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:51.9537539Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:51.9538047Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:51.9538599Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_gen_ai.so] 2025-05-07T20:03:51.9539083Z 0x000000000000000c (INIT) 0x15d000 2025-05-07T20:03:51.9539430Z 0x000000000000000d (FINI) 0x5089fc 2025-05-07T20:03:51.9539763Z 0x0000000000000019 (INIT_ARRAY) 0x5ae0d28 2025-05-07T20:03:51.9540134Z 0x000000000000001b (INIT_ARRAYSZ) 1136 (bytes) 2025-05-07T20:03:51.9540483Z 0x000000000000001a (FINI_ARRAY) 0x5ae1198 2025-05-07T20:03:51.9540841Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:51.9541171Z 0x000000006ffffef5 (GNU_HASH) 0x238 2025-05-07T20:03:51.9541520Z 0x0000000000000005 (STRTAB) 0x141b8 2025-05-07T20:03:51.9541837Z 0x0000000000000006 (SYMTAB) 0x4120 2025-05-07T20:03:51.9542213Z 0x000000000000000a (STRSZ) 1239382 (bytes) 2025-05-07T20:03:51.9542588Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:51.9542922Z 0x0000000000000003 (PLTGOT) 0x5ae4418 2025-05-07T20:03:51.9543301Z 0x0000000000000002 (PLTRELSZ) 44880 (bytes) 2025-05-07T20:03:51.9543637Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:51.9544165Z 0x0000000000000017 (JMPREL) 0x151300 2025-05-07T20:03:51.9544503Z 0x0000000000000007 (RELA) 0x144190 2025-05-07T20:03:51.9544876Z 0x0000000000000008 (RELASZ) 53616 (bytes) 2025-05-07T20:03:51.9545236Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:51.9545586Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:51.9546001Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:51.9546349Z 0x000000006ffffffe (VERNEED) 0x144070 2025-05-07T20:03:51.9546696Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:51.9547014Z 0x000000006ffffff0 (VERSYM) 0x142b0e 2025-05-07T20:03:51.9547361Z 0x000000006ffffff9 (RELACOUNT) 420 2025-05-07T20:03:51.9547664Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:51.9547921Z 2025-05-07T20:03:51.9548036Z ################################################################################ 2025-05-07T20:03:51.9548264Z 2025-05-07T20:03:51.9548268Z 2025-05-07T20:03:51.9548397Z ################################################################################ 2025-05-07T20:03:51.9549042Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9549740Z [CHECK] Listing out library size: 2025-05-07T20:03:51.9550335Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9550983Z 2025-05-07T20:03:51.9551377Z 1 ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9551803Z 2025-05-07T20:03:51.9552321Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9553530Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:51.9554257Z 2025-05-07T20:03:51.9597295Z GLIBC_2.2.5 2025-05-07T20:03:51.9597934Z GLIBC_2.14 2025-05-07T20:03:51.9598299Z 2025-05-07T20:03:51.9598312Z 2025-05-07T20:03:51.9600014Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9602508Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:51.9603447Z 2025-05-07T20:03:51.9647446Z GLIBCXX_3.4 2025-05-07T20:03:51.9648077Z GLIBCXX_3.4.9 2025-05-07T20:03:51.9648685Z GLIBCXX_3.4.21 2025-05-07T20:03:51.9649067Z 2025-05-07T20:03:51.9649081Z 2025-05-07T20:03:51.9671714Z + nm -gDC ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.w8V6pYbE1k.symbols.txt 2025-05-07T20:03:51.9672392Z 2025-05-07T20:03:51.9689228Z 2025-05-07T20:03:51.9717203Z [CHECK] Total Number of symbols: 154 2025-05-07T20:03:51.9741573Z [CHECK] Number of fbgemm symbols: 15 2025-05-07T20:03:51.9765024Z + nm -gDCu ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.REWLJGnlvt.usymbols.txt 2025-05-07T20:03:51.9765741Z 2025-05-07T20:03:51.9787344Z 2025-05-07T20:03:51.9821608Z [CHECK] Listing out undefined symbols (76 total): 2025-05-07T20:03:51.9840165Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:51.9840819Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:51.9841184Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:51.9841609Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:51.9841997Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:51.9842397Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:51.9842926Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:51.9843316Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:51.9843706Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:51.9844274Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:51.9844620Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:51.9844942Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:51.9845286Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:51.9845610Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:51.9845938Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:51.9846409Z U at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:51.9847092Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:51.9848015Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:51.9848766Z U c10::FloatType::get() 2025-05-07T20:03:51.9849139Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:51.9849623Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:51.9850066Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:51.9850460Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:51.9850820Z U c10::TensorType::get() 2025-05-07T20:03:51.9851163Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:51.9851935Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:51.9852825Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:51.9853713Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:51.9854673Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:51.9855829Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:51.9856681Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:51.9857086Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:51.9857426Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:51.9857765Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:51.9858118Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:51.9858540Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:51.9858902Z U memcpy@GLIBC_2.14 2025-05-07T20:03:51.9859379Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:51.9859673Z U memset@GLIBC_2.2.5 2025-05-07T20:03:51.9859968Z U ncclCommDestroy 2025-05-07T20:03:51.9860244Z U ncclCommInitAll 2025-05-07T20:03:51.9860555Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:51.9860914Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:51.9861488Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:51.9862333Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:51.9862941Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:51.9863326Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:51.9863735Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:51.9864258Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:51.9865201Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:51.9866019Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:51.9866374Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:51.9866762Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:51.9867299Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:51.9867906Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:51.9868397Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:51.9869076Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:51.9869838Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:51.9870237Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:51.9870581Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:51.9870919Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:51.9871748Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:51.9872951Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:51.9873820Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:51.9874567Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:51.9875216Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:51.9875627Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:51.9876085Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:51.9876545Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:51.9877159Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:51.9877865Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:51.9878323Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:51.9878666Z w _ITM_registerTMCloneTable 2025-05-07T20:03:51.9878984Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:51.9879306Z w __gmon_start__ 2025-05-07T20:03:51.9879599Z w __pthread_key_create 2025-05-07T20:03:51.9879948Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:51.9880689Z + ldd ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9881148Z 2025-05-07T20:03:51.9890029Z linux-vdso.so.1 (0x00007ffccf954000) 2025-05-07T20:03:51.9890375Z libtorch.so => not found 2025-05-07T20:03:51.9890628Z libc10.so => not found 2025-05-07T20:03:51.9890989Z libnccl.so.2 => not found 2025-05-07T20:03:51.9891290Z libtorch_cpu.so => not found 2025-05-07T20:03:51.9891565Z libtorch_cuda.so => not found 2025-05-07T20:03:51.9891859Z libcudart.so.12 => not found 2025-05-07T20:03:51.9892199Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff27f993000) 2025-05-07T20:03:51.9892654Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007ff27f93d000) 2025-05-07T20:03:51.9893170Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff27f90f000) 2025-05-07T20:03:51.9893580Z libc.so.6 => /lib64/libc.so.6 (0x00007ff27f707000) 2025-05-07T20:03:51.9893960Z libm.so.6 => /lib64/libm.so.6 (0x00007ff27f62c000) 2025-05-07T20:03:51.9894328Z /lib64/ld-linux-x86-64.so.2 (0x00007ff27fc71000) 2025-05-07T20:03:51.9894574Z 2025-05-07T20:03:51.9894705Z [CHECK] Displaying ELF information: 2025-05-07T20:03:51.9895296Z + readelf -d ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:51.9895811Z 2025-05-07T20:03:51.9936133Z 2025-05-07T20:03:51.9936477Z Dynamic section at offset 0x71978 contains 36 entries: 2025-05-07T20:03:51.9936916Z Tag Type Name/Value 2025-05-07T20:03:51.9937384Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:51.9938190Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:51.9938703Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:51.9939256Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:51.9939878Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:51.9940576Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:51.9941115Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:51.9941628Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:51.9942157Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:51.9942661Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:51.9943252Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_example_py.so] 2025-05-07T20:03:51.9943772Z 0x000000000000000c (INIT) 0x5000 2025-05-07T20:03:51.9944102Z 0x000000000000000d (FINI) 0x98dc 2025-05-07T20:03:51.9944456Z 0x0000000000000019 (INIT_ARRAY) 0x727d0 2025-05-07T20:03:51.9944807Z 0x000000000000001b (INIT_ARRAYSZ) 32 (bytes) 2025-05-07T20:03:51.9945174Z 0x000000000000001a (FINI_ARRAY) 0x727f0 2025-05-07T20:03:51.9945519Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:51.9945874Z 0x000000006ffffef5 (GNU_HASH) 0x200 2025-05-07T20:03:51.9946218Z 0x0000000000000005 (STRTAB) 0x1448 2025-05-07T20:03:51.9946545Z 0x0000000000000006 (SYMTAB) 0x5c0 2025-05-07T20:03:51.9946903Z 0x000000000000000a (STRSZ) 9973 (bytes) 2025-05-07T20:03:51.9947259Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:51.9947643Z 0x0000000000000003 (PLTGOT) 0x72c08 2025-05-07T20:03:51.9947996Z 0x0000000000000002 (PLTRELSZ) 2208 (bytes) 2025-05-07T20:03:51.9948361Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:51.9948704Z 0x0000000000000017 (JMPREL) 0x4530 2025-05-07T20:03:51.9949028Z 0x0000000000000007 (RELA) 0x3d38 2025-05-07T20:03:51.9949386Z 0x0000000000000008 (RELASZ) 2040 (bytes) 2025-05-07T20:03:51.9949743Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:51.9950090Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:51.9950418Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:51.9950783Z 0x000000006ffffffe (VERNEED) 0x3c78 2025-05-07T20:03:51.9951110Z 0x000000006fffffff (VERNEEDNUM) 4 2025-05-07T20:03:51.9951450Z 0x000000006ffffff0 (VERSYM) 0x3b3e 2025-05-07T20:03:51.9951792Z 0x000000006ffffff9 (RELACOUNT) 7 2025-05-07T20:03:51.9952105Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:51.9952312Z 2025-05-07T20:03:51.9952446Z ################################################################################ 2025-05-07T20:03:51.9952679Z 2025-05-07T20:03:51.9952683Z 2025-05-07T20:03:51.9952804Z ################################################################################ 2025-05-07T20:03:51.9953335Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:51.9953783Z [CHECK] Listing out library size: 2025-05-07T20:03:51.9954198Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:51.9954512Z 2025-05-07T20:03:51.9954676Z 1 ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:51.9954918Z 2025-05-07T20:03:51.9955247Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:51.9956122Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:51.9956658Z 2025-05-07T20:03:52.0023368Z GLIBC_2.2.5 2025-05-07T20:03:52.0024375Z GLIBC_2.14 2025-05-07T20:03:52.0025769Z 2025-05-07T20:03:52.0025779Z 2025-05-07T20:03:52.0026322Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:52.0027487Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:52.0028111Z 2025-05-07T20:03:52.0092559Z GLIBCXX_3.4 2025-05-07T20:03:52.0098276Z 2025-05-07T20:03:52.0098315Z 2025-05-07T20:03:52.0121357Z + nm -gDC ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so > /tmp/tmp.Vkp98nrF95.symbols.txt 2025-05-07T20:03:52.0122813Z 2025-05-07T20:03:52.0151638Z 2025-05-07T20:03:52.0176650Z [CHECK] Total Number of symbols: 841 2025-05-07T20:03:52.0194057Z [CHECK] Number of fbgemm symbols: 0 2025-05-07T20:03:52.0211039Z + nm -gDCu ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so > /tmp/tmp.Jqc7CBLWPs.usymbols.txt 2025-05-07T20:03:52.0211544Z 2025-05-07T20:03:52.0230222Z 2025-05-07T20:03:52.0257668Z [CHECK] Listing out undefined symbols (51 total): 2025-05-07T20:03:52.0279442Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:52.0280298Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:52.0280737Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:52.0281111Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:52.0281445Z U __errno_location@GLIBC_2.2.5 2025-05-07T20:03:52.0281800Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:52.0282124Z U abort@GLIBC_2.2.5 2025-05-07T20:03:52.0282463Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:52.0282879Z U close@GLIBC_2.2.5 2025-05-07T20:03:52.0283185Z U fputs@GLIBC_2.2.5 2025-05-07T20:03:52.0283470Z U free@GLIBC_2.2.5 2025-05-07T20:03:52.0283860Z U ftruncate64@GLIBC_2.2.5 2025-05-07T20:03:52.0284171Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:52.0284494Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:52.0284815Z U getpagesize@GLIBC_2.2.5 2025-05-07T20:03:52.0285124Z U madvise@GLIBC_2.2.5 2025-05-07T20:03:52.0285447Z U malloc@GLIBC_2.2.5 2025-05-07T20:03:52.0285741Z U memcmp@GLIBC_2.2.5 2025-05-07T20:03:52.0286046Z U memcpy@GLIBC_2.14 2025-05-07T20:03:52.0286329Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:52.0286632Z U memset@GLIBC_2.2.5 2025-05-07T20:03:52.0286933Z U mmap@GLIBC_2.2.5 2025-05-07T20:03:52.0287257Z U mprotect@GLIBC_2.2.5 2025-05-07T20:03:52.0287587Z U munmap@GLIBC_2.2.5 2025-05-07T20:03:52.0287871Z U open64@GLIBC_2.2.5 2025-05-07T20:03:52.0288189Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:52.0288538Z U pthread_mutex_destroy@GLIBC_2.2.5 2025-05-07T20:03:52.0288891Z U pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:52.0289324Z U pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:52.0289636Z U read@GLIBC_2.2.5 2025-05-07T20:03:52.0290115Z U realloc@GLIBC_2.2.5 2025-05-07T20:03:52.0290401Z U shm_open@GLIBC_2.2.5 2025-05-07T20:03:52.0290698Z U shm_unlink@GLIBC_2.2.5 2025-05-07T20:03:52.0290981Z U snprintf@GLIBC_2.2.5 2025-05-07T20:03:52.0291480Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:52.0291782Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:52.0292075Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:52.0292355Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:52.0292674Z U strtol@GLIBC_2.2.5 2025-05-07T20:03:52.0292967Z U syscall@GLIBC_2.2.5 2025-05-07T20:03:52.0293265Z U sysconf@GLIBC_2.2.5 2025-05-07T20:03:52.0293561Z U uname@GLIBC_2.2.5 2025-05-07T20:03:52.0293834Z U unlink@GLIBC_2.2.5 2025-05-07T20:03:52.0294201Z U vsnprintf@GLIBC_2.2.5 2025-05-07T20:03:52.0294548Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:52.0294991Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:52.0295465Z U vtable for __cxxabiv1::__vmi_class_type_info@CXXABI_1.3 2025-05-07T20:03:52.0295901Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:52.0296240Z w _ITM_registerTMCloneTable 2025-05-07T20:03:52.0296653Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:52.0296948Z w __gmon_start__ 2025-05-07T20:03:52.0297255Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:52.0297652Z + ldd ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:52.0297893Z 2025-05-07T20:03:52.0325052Z linux-vdso.so.1 (0x00007ffe28551000) 2025-05-07T20:03:52.0325436Z libtorch_cpu.so => not found 2025-05-07T20:03:52.0325753Z libtorch_cuda.so => not found 2025-05-07T20:03:52.0326023Z libtorch.so => not found 2025-05-07T20:03:52.0326367Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f19a195e000) 2025-05-07T20:03:52.0326800Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f19a1908000) 2025-05-07T20:03:52.0327322Z librt.so.1 => /lib64/librt.so.1 (0x00007f19a1901000) 2025-05-07T20:03:52.0327709Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f19a18d3000) 2025-05-07T20:03:52.0328152Z libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f19a18ce000) 2025-05-07T20:03:52.0328560Z libc.so.6 => /lib64/libc.so.6 (0x00007f19a16c6000) 2025-05-07T20:03:52.0328924Z libm.so.6 => /lib64/libm.so.6 (0x00007f19a15eb000) 2025-05-07T20:03:52.0329293Z /lib64/ld-linux-x86-64.so.2 (0x00007f19a1c3e000) 2025-05-07T20:03:52.0329540Z 2025-05-07T20:03:52.0329764Z [CHECK] Displaying ELF information: 2025-05-07T20:03:52.0330128Z + readelf -d ./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so 2025-05-07T20:03:52.0330390Z 2025-05-07T20:03:52.0360649Z 2025-05-07T20:03:52.0361306Z Dynamic section at offset 0x74dd0 contains 35 entries: 2025-05-07T20:03:52.0362706Z Tag Type Name/Value 2025-05-07T20:03:52.0363369Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:52.0364004Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:52.0364543Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:52.0365056Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:52.0365586Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:52.0366101Z 0x0000000000000001 (NEEDED) Shared library: [librt.so.1] 2025-05-07T20:03:52.0366608Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:52.0367380Z 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 2025-05-07T20:03:52.0367900Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:52.0368410Z 0x000000000000000e (SONAME) Library soname: [asmjit.so] 2025-05-07T20:03:52.0368996Z 0x000000000000000c (INIT) 0x19000 2025-05-07T20:03:52.0369355Z 0x000000000000000d (FINI) 0x56a1c 2025-05-07T20:03:52.0369713Z 0x0000000000000019 (INIT_ARRAY) 0x74ff8 2025-05-07T20:03:52.0370058Z 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:52.0370423Z 0x000000000000001a (FINI_ARRAY) 0x75000 2025-05-07T20:03:52.0370768Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:52.0371129Z 0x000000006ffffef5 (GNU_HASH) 0x200 2025-05-07T20:03:52.0371464Z 0x0000000000000005 (STRTAB) 0x7120 2025-05-07T20:03:52.0371831Z 0x0000000000000006 (SYMTAB) 0x2230 2025-05-07T20:03:52.0372197Z 0x000000000000000a (STRSZ) 48790 (bytes) 2025-05-07T20:03:52.0372598Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:52.0373049Z 0x0000000000000003 (PLTGOT) 0x76050 2025-05-07T20:03:52.0373422Z 0x0000000000000002 (PLTRELSZ) 8472 (bytes) 2025-05-07T20:03:52.0373825Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:52.0374221Z 0x0000000000000017 (JMPREL) 0x16a58 2025-05-07T20:03:52.0374637Z 0x0000000000000007 (RELA) 0x13710 2025-05-07T20:03:52.0375011Z 0x0000000000000008 (RELASZ) 13128 (bytes) 2025-05-07T20:03:52.0375431Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:52.0375809Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:52.0376159Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:52.0376553Z 0x000000006ffffffe (VERNEED) 0x13650 2025-05-07T20:03:52.0376908Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:52.0377282Z 0x000000006ffffff0 (VERSYM) 0x12fb6 2025-05-07T20:03:52.0377626Z 0x000000006ffffff9 (RELACOUNT) 3 2025-05-07T20:03:52.0377977Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:52.0378195Z 2025-05-07T20:03:52.0378321Z ################################################################################ 2025-05-07T20:03:52.0378591Z 2025-05-07T20:03:52.0378595Z 2025-05-07T20:03:52.0378728Z ################################################################################ 2025-05-07T20:03:52.0379331Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.0379768Z [CHECK] Listing out library size: 2025-05-07T20:03:52.0380202Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.0380512Z 2025-05-07T20:03:52.0380679Z 6 ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.0380963Z 2025-05-07T20:03:52.0381294Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.0382180Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:52.0382712Z 2025-05-07T20:03:52.0644566Z GLIBC_2.2.5 2025-05-07T20:03:52.0644894Z GLIBC_2.3 2025-05-07T20:03:52.0645120Z GLIBC_2.14 2025-05-07T20:03:52.0645285Z 2025-05-07T20:03:52.0645290Z 2025-05-07T20:03:52.0645665Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.0646632Z + objdump -TC ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:52.0647336Z 2025-05-07T20:03:52.0914611Z GLIBCXX_3.4 2025-05-07T20:03:52.0915288Z GLIBCXX_3.4.9 2025-05-07T20:03:52.0915920Z GLIBCXX_3.4.11 2025-05-07T20:03:52.0916507Z GLIBCXX_3.4.14 2025-05-07T20:03:52.0917118Z GLIBCXX_3.4.15 2025-05-07T20:03:52.0917695Z GLIBCXX_3.4.18 2025-05-07T20:03:52.0918297Z GLIBCXX_3.4.21 2025-05-07T20:03:52.0918675Z 2025-05-07T20:03:52.0918688Z 2025-05-07T20:03:52.0935027Z + nm -gDC ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so > /tmp/tmp.23WjVcFmsq.symbols.txt 2025-05-07T20:03:52.0936297Z 2025-05-07T20:03:52.1163318Z 2025-05-07T20:03:52.1191198Z [CHECK] Total Number of symbols: 4951 2025-05-07T20:03:52.1207143Z [CHECK] Number of fbgemm symbols: 3554 2025-05-07T20:03:52.1227917Z + nm -gDCu ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so > /tmp/tmp.WvBhvuOYTW.usymbols.txt 2025-05-07T20:03:52.1229266Z 2025-05-07T20:03:52.1253680Z 2025-05-07T20:03:52.1282523Z [CHECK] Listing out undefined symbols (133 total): 2025-05-07T20:03:52.1303452Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:52.1303967Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:52.1304379Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:52.1304723Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:52.1305045Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:52.1305384Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:52.1305930Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:52.1306272Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:52.1306607Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:52.1306983Z U __cxa_init_primary_exception@CXXABI_1.3.11 2025-05-07T20:03:52.1307416Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:52.1307792Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:52.1308144Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:52.1308455Z U __extendhfsf2@GCC_12.0.0 2025-05-07T20:03:52.1308793Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:52.1309244Z U __once_proxy@GLIBCXX_3.4.11 2025-05-07T20:03:52.1309575Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:52.1309881Z U __truncsfhf2@GCC_12.0.0 2025-05-07T20:03:52.1310197Z U abort@GLIBC_2.2.5 2025-05-07T20:03:52.1310687Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:52.1311434Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:52.1312414Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:52.1313598Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:52.1314725Z U asmjit::_abi_1_13::BaseEmitter::emitArgsAssignment(asmjit::_abi_1_13::FuncFrame const&, asmjit::_abi_1_13::FuncArgsAssignment const&) 2025-05-07T20:03:52.1315520Z U asmjit::_abi_1_13::BaseEmitter::emitEpilog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:52.1316120Z U asmjit::_abi_1_13::BaseEmitter::emitProlog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:52.1316846Z U asmjit::_abi_1_13::CodeHolder::CodeHolder(asmjit::_abi_1_13::Support::Temporary const*) 2025-05-07T20:03:52.1317469Z U asmjit::_abi_1_13::CodeHolder::init(asmjit::_abi_1_13::Environment const&, unsigned long) 2025-05-07T20:03:52.1317955Z U asmjit::_abi_1_13::CodeHolder::~CodeHolder() 2025-05-07T20:03:52.1318479Z U asmjit::_abi_1_13::FuncArgsAssignment::updateFuncFrame(asmjit::_abi_1_13::FuncFrame&) const 2025-05-07T20:03:52.1319211Z U asmjit::_abi_1_13::FuncDetail::init(asmjit::_abi_1_13::FuncSignature const&, asmjit::_abi_1_13::Environment const&) 2025-05-07T20:03:52.1319759Z U asmjit::_abi_1_13::FuncFrame::finalize() 2025-05-07T20:03:52.1320189Z U asmjit::_abi_1_13::FuncFrame::init(asmjit::_abi_1_13::FuncDetail const&) 2025-05-07T20:03:52.1320790Z U asmjit::_abi_1_13::JitRuntime::JitRuntime(asmjit::_abi_1_13::JitAllocator::CreateParams const*) 2025-05-07T20:03:52.1321376Z U asmjit::_abi_1_13::JitRuntime::~JitRuntime() 2025-05-07T20:03:52.1321829Z U asmjit::_abi_1_13::x86::Assembler::Assembler(asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:03:52.1322271Z U asmjit::_abi_1_13::x86::Assembler::~Assembler() 2025-05-07T20:03:52.1322740Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:52.1323196Z U ceilf@GLIBC_2.2.5 2025-05-07T20:03:52.1323511Z U cpuinfo_get_packages 2025-05-07T20:03:52.1323896Z U cpuinfo_get_packages_count 2025-05-07T20:03:52.1324209Z U cpuinfo_initialize 2025-05-07T20:03:52.1324517Z U cpuinfo_isa 2025-05-07T20:03:52.1324783Z U floor@GLIBC_2.2.5 2025-05-07T20:03:52.1325084Z U fma@GLIBC_2.2.5 2025-05-07T20:03:52.1325360Z U fmaf@GLIBC_2.2.5 2025-05-07T20:03:52.1325707Z U free@GLIBC_2.2.5 2025-05-07T20:03:52.1325989Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:52.1326304Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:52.1326594Z U ldexp@GLIBC_2.2.5 2025-05-07T20:03:52.1326932Z U log2@GLIBC_2.2.5 2025-05-07T20:03:52.1327266Z U log2f@GLIBC_2.2.5 2025-05-07T20:03:52.1327556Z U lrintf@GLIBC_2.2.5 2025-05-07T20:03:52.1327867Z U memcpy@GLIBC_2.14 2025-05-07T20:03:52.1328152Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:52.1328457Z U memset@GLIBC_2.2.5 2025-05-07T20:03:52.1328747Z U nearbyint@GLIBC_2.2.5 2025-05-07T20:03:52.1329068Z U nearbyintf@GLIBC_2.2.5 2025-05-07T20:03:52.1329394Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:52.1329759Z U operator delete[](void*)@GLIBCXX_3.4 2025-05-07T20:03:52.1330148Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:52.1330515Z U operator new[](unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:52.1330888Z U posix_memalign@GLIBC_2.2.5 2025-05-07T20:03:52.1331201Z U sqrtf@GLIBC_2.2.5 2025-05-07T20:03:52.1331637Z U std::_Hash_bytes(void const*, unsigned long, unsigned long)@CXXABI_1.3.5 2025-05-07T20:03:52.1332143Z U std::_Rb_tree_decrement(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:52.1332632Z U std::_Rb_tree_increment(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:52.1333332Z U std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4 2025-05-07T20:03:52.1334095Z U std::__atomic_futex_unsigned_base::_M_futex_notify_all(unsigned int*)@GLIBCXX_3.4.21 2025-05-07T20:03:52.1335200Z U std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration >, std::chrono::duration >)@GLIBCXX_3.4.21 2025-05-07T20:03:52.1336571Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:52.1337295Z U std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:52.1337784Z U std::__exception_ptr::exception_ptr::_M_addref() 2025-05-07T20:03:52.1338187Z U std::__exception_ptr::exception_ptr::_M_release() 2025-05-07T20:03:52.1338653Z U std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11 2025-05-07T20:03:52.1339141Z U std::__future_base::_Result_base::_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:52.1339613Z U std::__future_base::_Result_base::~_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:52.1340002Z U std::__once_call@GLIBCXX_3.4.11 2025-05-07T20:03:52.1340349Z U std::__once_callable@GLIBCXX_3.4.11 2025-05-07T20:03:52.1340735Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:52.1341061Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:52.1341402Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:52.1341743Z U std::__throw_bad_function_call()@GLIBCXX_3.4.14 2025-05-07T20:03:52.1342123Z U std::__throw_future_error(int)@GLIBCXX_3.4.14 2025-05-07T20:03:52.1342488Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:52.1342870Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:52.1343229Z U std::bad_alloc::~bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:52.1343989Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:52.1344774Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:52.1345057Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:52.1345436Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:52.1345909Z U std::future_category()@GLIBCXX_3.4.15 2025-05-07T20:03:52.1346263Z U std::future_error::~future_error()@GLIBCXX_3.4.14 2025-05-07T20:03:52.1346640Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:52.1346972Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:52.1347597Z U std::logic_error::logic_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:52.1348292Z U std::logic_error::logic_error(std::logic_error const&)@GLIBCXX_3.4.21 2025-05-07T20:03:52.1348785Z U std::ostream& std::ostream::_M_insert(double)@GLIBCXX_3.4.9 2025-05-07T20:03:52.1349267Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:52.1349784Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:52.1350248Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:52.1350591Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:52.1350923Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:52.1351361Z U std::rethrow_exception(std::__exception_ptr::exception_ptr)@CXXABI_1.3.3 2025-05-07T20:03:52.1351850Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:52.1352278Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:52.1352634Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:52.1352931Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:52.1353216Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:52.1353488Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:52.1353778Z U strstr@GLIBC_2.2.5 2025-05-07T20:03:52.1354061Z U tolower@GLIBC_2.2.5 2025-05-07T20:03:52.1354621Z U toupper@GLIBC_2.2.5 2025-05-07T20:03:52.1354996Z U typeinfo for std::__future_base::_Result_base@GLIBCXX_3.4.15 2025-05-07T20:03:52.1355430Z U typeinfo for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:52.1355816Z U typeinfo for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:52.1356197Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:52.1356608Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:52.1357023Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:52.1357431Z U vtable for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:52.1357806Z U vtable for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:52.1358153Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:52.1358526Z w _ITM_registerTMCloneTable 2025-05-07T20:03:52.1358839Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:52.1359153Z w __gmon_start__ 2025-05-07T20:03:52.1359431Z w __pthread_key_create 2025-05-07T20:03:52.1359755Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:52.1360084Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:52.1360411Z w pthread_once 2025-05-07T20:03:52.1360706Z w pthread_rwlock_rdlock 2025-05-07T20:03:52.1361010Z w pthread_rwlock_unlock 2025-05-07T20:03:52.1361319Z w pthread_rwlock_wrlock 2025-05-07T20:03:52.1361619Z w pthread_self@GLIBC_2.2.5 2025-05-07T20:03:52.1361983Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:52.1362407Z + ldd ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.1362788Z 2025-05-07T20:03:52.1362932Z linux-vdso.so.1 (0x00007ffd8997e000) 2025-05-07T20:03:52.1363431Z libc10.so => not found 2025-05-07T20:03:52.1364024Z asmjit.so => /__w/FBGEMM/FBGEMM/fbgemm_gpu/./_skbuild/linux-x86_64-3.11/cmake-build/asmjit.so (0x00007fd0c8410000) 2025-05-07T20:03:52.1364623Z libtorch.so => not found 2025-05-07T20:03:52.1364882Z libtorch_cpu.so => not found 2025-05-07T20:03:52.1365173Z libtorch_cuda.so => not found 2025-05-07T20:03:52.1365514Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd0c7b9c000) 2025-05-07T20:03:52.1365929Z libm.so.6 => /lib64/libm.so.6 (0x00007fd0c7ac1000) 2025-05-07T20:03:52.1366315Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd0c83e0000) 2025-05-07T20:03:52.1366724Z libc.so.6 => /lib64/libc.so.6 (0x00007fd0c78b9000) 2025-05-07T20:03:52.1367336Z /lib64/ld-linux-x86-64.so.2 (0x00007fd0c848c000) 2025-05-07T20:03:52.1367679Z libtorch_cpu.so => not found 2025-05-07T20:03:52.1367975Z libtorch_cuda.so => not found 2025-05-07T20:03:52.1368252Z libtorch.so => not found 2025-05-07T20:03:52.1368588Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fd0c8388000) 2025-05-07T20:03:52.1368989Z librt.so.1 => /lib64/librt.so.1 (0x00007fd0c8383000) 2025-05-07T20:03:52.1369429Z libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd0c837e000) 2025-05-07T20:03:52.1369721Z 2025-05-07T20:03:52.1369853Z [CHECK] Displaying ELF information: 2025-05-07T20:03:52.1370228Z + readelf -d ./_skbuild/linux-x86_64-3.11/cmake-build/fbgemm.so 2025-05-07T20:03:52.1370511Z 2025-05-07T20:03:52.1397645Z 2025-05-07T20:03:52.1398313Z Dynamic section at offset 0x54b548 contains 37 entries: 2025-05-07T20:03:52.1399353Z Tag Type Name/Value 2025-05-07T20:03:52.1399778Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:52.1400288Z 0x0000000000000001 (NEEDED) Shared library: [asmjit.so] 2025-05-07T20:03:52.1400792Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:52.1401315Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:52.1401833Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:52.1402379Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:52.1403280Z 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 2025-05-07T20:03:52.1416068Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:52.1416607Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:52.1417268Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:52.1417804Z 0x000000000000000e (SONAME) Library soname: [fbgemm.so] 2025-05-07T20:03:52.1418287Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T20:03:52.1418719Z 0x000000000000000c (INIT) 0xfd000 2025-05-07T20:03:52.1419050Z 0x000000000000000d (FINI) 0x4bfc58 2025-05-07T20:03:52.1419600Z 0x0000000000000019 (INIT_ARRAY) 0x548040 2025-05-07T20:03:52.1419952Z 0x000000000000001b (INIT_ARRAYSZ) 1224 (bytes) 2025-05-07T20:03:52.1420333Z 0x000000000000001a (FINI_ARRAY) 0x548508 2025-05-07T20:03:52.1420676Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:52.1421034Z 0x000000006ffffef5 (GNU_HASH) 0x238 2025-05-07T20:03:52.1421379Z 0x0000000000000005 (STRTAB) 0x24d98 2025-05-07T20:03:52.1421713Z 0x0000000000000006 (SYMTAB) 0x7d58 2025-05-07T20:03:52.1422181Z 0x000000000000000a (STRSZ) 754228 (bytes) 2025-05-07T20:03:52.1422526Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:52.1422875Z 0x0000000000000003 (PLTGOT) 0x54b7d8 2025-05-07T20:03:52.1423211Z 0x0000000000000002 (PLTRELSZ) 25992 (bytes) 2025-05-07T20:03:52.1423612Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:52.1423921Z 0x0000000000000017 (JMPREL) 0xf6410 2025-05-07T20:03:52.1424257Z 0x0000000000000007 (RELA) 0xdf7f0 2025-05-07T20:03:52.1424647Z 0x0000000000000008 (RELASZ) 93216 (bytes) 2025-05-07T20:03:52.1425027Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:52.1425363Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:52.1425668Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:52.1426022Z 0x000000006ffffffe (VERNEED) 0xdf680 2025-05-07T20:03:52.1426333Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:52.1426653Z 0x000000006ffffff0 (VERSYM) 0xdcfcc 2025-05-07T20:03:52.1426978Z 0x000000006ffffff9 (RELACOUNT) 155 2025-05-07T20:03:52.1427270Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:52.1427462Z 2025-05-07T20:03:52.1427589Z ################################################################################ 2025-05-07T20:03:52.1427806Z 2025-05-07T20:03:52.1427810Z 2025-05-07T20:03:52.1428023Z [CHECK] Verifying sample subset of symbols in the built libraries ... 2025-05-07T20:03:52.1712879Z [CHECK] Found symbol in ./_skbuild/linux-x86_64-3.11/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so: fbgemm_gpu::per_tensor_quantize_i8 2025-05-07T20:03:52.1715253Z ################################################################################ 2025-05-07T20:03:52.1716828Z [BUILD] Wheel Audit: dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:52.1718295Z 2025-05-07T20:03:52.1719739Z + conda run --no-capture-output -n build_binary auditwheel show dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:52.1721489Z 2025-05-07T20:03:55.7789914Z 2025-05-07T20:03:55.7791043Z fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.7791809Z is consistent with the following platform tag: "linux_x86_64". 2025-05-07T20:03:55.7792101Z 2025-05-07T20:03:55.7792263Z The wheel references external versioned symbols in these 2025-05-07T20:03:55.7792725Z system-provided shared libraries: librt.so.1 with versions 2025-05-07T20:03:55.7793156Z {'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.0', 2025-05-07T20:03:55.7793584Z 'GCC_12.0.0'}, libstdc++.so.6 with versions {'GLIBCXX_3.4.15', 2025-05-07T20:03:55.7794027Z 'CXXABI_1.3.11', 'GLIBCXX_3.4.18', 'CXXABI_1.3', 'CXXABI_1.3.3', 2025-05-07T20:03:55.7794474Z 'GLIBCXX_3.4.11', 'GLIBCXX_3.4.9', 'GLIBCXX_3.4.14', 'GLIBCXX_3.4.21', 2025-05-07T20:03:55.7794943Z 'CXXABI_1.3.5', 'GLIBCXX_3.4.29', 'GLIBCXX_3.4', 'CXXABI_1.3.7'}, 2025-05-07T20:03:55.7795390Z libc.so.6 with versions {'GLIBC_2.3', 'GLIBC_2.3.2', 'GLIBC_2.2.5', 2025-05-07T20:03:55.7795824Z 'GLIBC_2.6', 'GLIBC_2.3.3', 'GLIBC_2.14', 'GLIBC_2.17'}, 2025-05-07T20:03:55.7796236Z libpthread.so.0 with versions {'GLIBC_2.2.5', 'GLIBC_2.3.4'}, 2025-05-07T20:03:55.7796730Z libm.so.6 with versions {'GLIBC_2.2.5'}, libcudart.so.12 with versions 2025-05-07T20:03:55.7797219Z {'libcudart.so.12'}, libdl.so.2 with versions {'GLIBC_2.2.5', 2025-05-07T20:03:55.7797924Z 'GLIBC_2.3.4'} 2025-05-07T20:03:55.7798046Z 2025-05-07T20:03:55.7798265Z This constrains the platform tag to "manylinux_2_35_x86_64". In order 2025-05-07T20:03:55.7798750Z to achieve a more compatible tag, you would need to recompile a new 2025-05-07T20:03:55.7799215Z wheel from source on a system with earlier versions of these 2025-05-07T20:03:55.7799599Z libraries, such as a recent manylinux image. 2025-05-07T20:03:55.8540809Z 2025-05-07T20:03:55.8540915Z 2025-05-07T20:03:55.8541435Z ################################################################################ 2025-05-07T20:03:55.8542480Z [BUILD] Enumerating the built wheels ... 2025-05-07T20:03:55.8542977Z + ls -lth dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.8543373Z 2025-05-07T20:03:55.8600152Z -rw-r--r--. 1 root root 19M May 7 20:03 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.8601652Z 2025-05-07T20:03:55.8601803Z [BUILD] Enumerating the wheel SHAs ... 2025-05-07T20:03:55.8605543Z + sha1sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.8605955Z 2025-05-07T20:03:55.8959871Z c326345df354c6141153099e3e50ba8d6de34fcb dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.8961552Z 2025-05-07T20:03:55.8962690Z + sha256sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.8963319Z 2025-05-07T20:03:55.9760266Z 9f4154b2f6c41ae40824604f2980de212f6e65550128fe52cae1c9c75e71312b dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.9762279Z 2025-05-07T20:03:55.9764393Z + md5sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:55.9765305Z 2025-05-07T20:03:56.0080234Z 1c01cd21bdf738277ab20dc3f0582ce3 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp311-cp311-manylinux_2_28_x86_64.whl 2025-05-07T20:03:56.0081127Z 2025-05-07T20:03:56.0082339Z [BUILD] FBGEMM-GPU build + package completed 2025-05-07T20:03:56.1789480Z ##[group]Run actions/upload-artifact@v4 2025-05-07T20:03:56.1789856Z with: 2025-05-07T20:03:56.1790153Z name: fbgemm_genai_x86_clang_py3.11_cu12.8.0.whl 2025-05-07T20:03:56.1790516Z path: fbgemm_gpu/dist/*.whl 2025-05-07T20:03:56.1790839Z if-no-files-found: error 2025-05-07T20:03:56.1791127Z compression-level: 6 2025-05-07T20:03:56.1791426Z overwrite: false 2025-05-07T20:03:56.1791717Z include-hidden-files: false 2025-05-07T20:03:56.1791999Z env: 2025-05-07T20:03:56.1792273Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T20:03:56.1792598Z BUILD_ENV: build_binary 2025-05-07T20:03:56.1792905Z BUILD_TARGET: genai 2025-05-07T20:03:56.1793158Z BUILD_VARIANT: cuda 2025-05-07T20:03:56.1793447Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T20:03:56.1793730Z ##[endgroup] 2025-05-07T20:03:56.1803405Z ##[command]/usr/bin/docker exec 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:03:57.3853365Z With the provided path, there will be 1 file uploaded 2025-05-07T20:03:57.3855465Z Artifact name is valid! 2025-05-07T20:03:57.3856523Z Root directory input is valid! 2025-05-07T20:03:57.5126879Z Beginning upload of artifact content to blob storage 2025-05-07T20:03:58.2217810Z Uploaded bytes 8388608 2025-05-07T20:03:58.5774257Z Uploaded bytes 16777216 2025-05-07T20:03:58.6401329Z Uploaded bytes 18493360 2025-05-07T20:03:58.6550581Z Finished uploading artifact content to blob storage! 2025-05-07T20:03:58.6551319Z SHA256 digest of uploaded artifact zip is 712e5982f3c27e6bb70c4c07f6076ab85e5daa73adc8fdd928558f49c8845247 2025-05-07T20:03:58.6551967Z Finalizing artifact upload 2025-05-07T20:03:58.7304264Z Artifact fbgemm_genai_x86_clang_py3.11_cu12.8.0.whl.zip successfully finalized. Artifact ID 3081407693 2025-05-07T20:03:58.7307065Z Artifact fbgemm_genai_x86_clang_py3.11_cu12.8.0.whl has been successfully uploaded! Final size is 18493360 bytes. Artifact ID is 3081407693 2025-05-07T20:03:58.7309877Z Artifact download URL: https://github.com/pytorch/FBGEMM/actions/runs/14891846252/artifacts/3081407693 2025-05-07T20:03:58.7617114Z Post job cleanup. 2025-05-07T20:03:58.7630830Z ##[command]/usr/bin/docker exec 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:03:59.0487350Z [command]/usr/bin/git version 2025-05-07T20:03:59.0756348Z git version 2.47.1 2025-05-07T20:03:59.0788179Z Copying '/github/home/.gitconfig' to '/__w/_temp/9facff96-c137-4d35-907f-c2044e23a734/.gitconfig' 2025-05-07T20:03:59.0807093Z Temporarily overriding HOME='/__w/_temp/9facff96-c137-4d35-907f-c2044e23a734' before making global git config changes 2025-05-07T20:03:59.0809644Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T20:03:59.0811898Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T20:03:59.0860460Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T20:03:59.0886827Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T20:03:59.1493383Z Entering 'external/asmjit' 2025-05-07T20:03:59.1601771Z Entering 'external/composable_kernel' 2025-05-07T20:03:59.1762721Z Entering 'external/cpuinfo' 2025-05-07T20:03:59.1882471Z Entering 'external/cutlass' 2025-05-07T20:03:59.2045679Z Entering 'external/googletest' 2025-05-07T20:03:59.2151308Z Entering 'external/hipify_torch' 2025-05-07T20:03:59.2259843Z Entering 'external/json' 2025-05-07T20:03:59.2366993Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T20:03:59.2387172Z http.https://github.com/.extraheader 2025-05-07T20:03:59.2391382Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-05-07T20:03:59.2414468Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T20:03:59.2693412Z Entering 'external/asmjit' 2025-05-07T20:03:59.2743516Z http.https://github.com/.extraheader 2025-05-07T20:03:59.2785194Z Entering 'external/composable_kernel' 2025-05-07T20:03:59.2816916Z http.https://github.com/.extraheader 2025-05-07T20:03:59.2861490Z Entering 'external/cpuinfo' 2025-05-07T20:03:59.2905000Z http.https://github.com/.extraheader 2025-05-07T20:03:59.2945090Z Entering 'external/cutlass' 2025-05-07T20:03:59.2975765Z http.https://github.com/.extraheader 2025-05-07T20:03:59.3014677Z Entering 'external/googletest' 2025-05-07T20:03:59.3047479Z http.https://github.com/.extraheader 2025-05-07T20:03:59.3084331Z Entering 'external/hipify_torch' 2025-05-07T20:03:59.3120634Z http.https://github.com/.extraheader 2025-05-07T20:03:59.3157460Z Entering 'external/json' 2025-05-07T20:03:59.3202539Z http.https://github.com/.extraheader 2025-05-07T20:03:59.3425453Z Stop and remove container: b1b3efcb00dd441ea660ffc468bcf084_amazonlinux2023_d732a4 2025-05-07T20:03:59.3430736Z ##[command]/usr/bin/docker rm --force 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 2025-05-07T20:04:00.4673300Z 2aa0e203fee372054944a8575a7288d464ba67ebdd0a95ce94bb1f4aeb4f06a9 2025-05-07T20:04:00.4709721Z Remove container network: github_network_60639c660a9d41089b45e16508e07c21 2025-05-07T20:04:00.4714155Z ##[command]/usr/bin/docker network rm github_network_60639c660a9d41089b45e16508e07c21 2025-05-07T20:04:01.4861401Z github_network_60639c660a9d41089b45e16508e07c21 2025-05-07T20:04:01.4893949Z A job completed hook has been configured by the self-hosted runner administrator 2025-05-07T20:04:01.5062671Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-05-07T20:04:01.5068986Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T20:04:01.5069571Z ##[endgroup] 2025-05-07T20:04:13.6553891Z Cleaning up orphan processes