Description of problem: flatpak, gnome-software and gnome-photos crash with invalid opcode Version-Release number of selected component (if applicable): How reproducible: run `gnome-software` or `flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo` Additional info: BTW, I installed Fedora 36 with QEMU QEMU version info: QEMU emulator version 6.2.0 (v6.2.0-11889-g5b72bf03f5-dirty) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers I launch QEMU with below command line: $ qemu-system-x86_64.exe \ -display gtk,show-cursor=on,grab-on-hover=on,gl=off,zoom-to-fit=off \ --accel whpx \ -smp 10 \ -m 10G \ -k en \ -drive file=${IMAGE},if=virtio \ -device virtio-vga \ -device virtio-net,netdev=vmnic -netdev user,id=vmnic \ -usbdevice tablet Host CPU: AMD Ryzen 9 5950X Host OS: Windows 11 Logs in Fedora 36: [penghuang@fedora ~]$ gnome-software 01:30:12:0969 Gs failed to load metadata: cancelled by user action 01:30:12:0984 Gs Only 0 apps for recent list, hiding 01:30:13:0188 Gs ignoring unknown or empty provided item type: font 01:30:13:0315 Gs ignoring unknown or empty provided item type: font 01:30:13:0316 Gs ignoring unknown or empty provided item type: font Illegal instruction (core dumped) [penghuang@fedora ~]$ flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo Illegal instruction (core dumped) [penghuang@fedora ~]$ dmsg ... [ 57.591535] traps: gnome-photos[2662] trap invalid opcode ip:7f516c74ba34 sp:7fff3fc5d7a8 error:0 in avx2-int8.so[7f516c74b000+2000] [ 106.181067] traps: pool-org.gnome.[2970] trap invalid opcode ip:7fb4b4247e4a sp:7fb4837fcc80 error:0 in libgnutls.so.30.31.0[7fb4b412e000+134000] [ 106.338941] traps: pool-/usr/libex[2990] trap invalid opcode ip:7fe333f38e4a sp:7fe3316f8c80 error:0 in libgnutls.so.30.31.0[7fe333e1f000+134000] [ 118.473914] traps: pool-/usr/libex[3290] trap invalid opcode ip:7fcd218f3e4a sp:7fcd110aac80 error:0 in libgnutls.so.30.31.0[7fcd217da000+134000] [ 118.481087] traps: pool-org.gnome.[3273] trap invalid opcode ip:7f0eac5a2e4a sp:7f0ea68afc80 error:0 in libgnutls.so.30.31.0[7f0eac489000+134000] [ 135.696232] traps: pool-flatpak re[3638] trap invalid opcode ip:7f78450cde4a sp:7f7837ffdc80 error:0 in libgnutls.so.30.31.0[7f7844fb4000+134000] [ 174.225046] traps: gnome-photos[3742] trap invalid opcode ip:7fcecd7b6a34 sp:7ffd936d1d28 error:0 in avx2-int8.so[7fcecd7b6000+2000] [ 174.871310] traps: gnome-photos[3871] trap invalid opcode ip:7f8f3c25ba4c sp:7ffdc26b1028 error:0 in avx2-int8.so[7f8f3c25b000+2000] Crash stack from flatpak: Downloading separate debug info for /lib64/liblzma.so.5... Downloading separate debug info for /home/penghuang/Sources/system-supplied DSO at 0x7fff30f55000... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo'. Program terminated with signal SIGILL, Illegal instruction. #0 0x00007f89783cbe4a in sha512_block_data_order_avx2 () from /lib64/libgnutls.so.30 [Current thread is 1 (Thread 0x7f8972ada640 (LWP 5083))] (gdb) bt #0 0x00007f89783cbe4a in sha512_block_data_order_avx2 () from /lib64/libgnutls.so.30 #1 0x00007f89783bf042 in x86_sha512_update (ctx=0x7f8972ad9090, length=128, data=0x7f8972ad8f90 '\\' <repeats 128 times>, "@\255") at sha-x86-ssse3.c:215 #2 0x00007f897810879b in nettle_hmac_set_key (outer=<optimized out>, inner=0x7f8972ad9168, state=<optimized out>, hash=0x7f897848b6c0 <x86_sha384>, key_length=0, key=0x7f89783ff943 "") at /usr/src/debug/nettle-3.7.3-3.fc36.x86_64/hmac.c:83 #3 0x00007f89783bce3a in wrap_x86_hmac_fast (algo=<optimized out>, nonce=<optimized out>, nonce_size=<optimized out>, key=0x7f89783ff943, key_size=0, text=0x7f8972ad9430, text_size=48, digest=0x55a79d80b948) at hmac-x86-ssse3.c:294 #4 0x00007f89782d4b57 in _gnutls_mac_fast (algorithm=GNUTLS_MAC_SHA384, key=0x7f89783ff943, keylen=0, text=0x7f8972ad9430, textlen=48, digest=0x55a79d80b948) at hash_int.c:167 #5 0x00007f89782f524d in gnutls_hmac_fast (algorithm=GNUTLS_MAC_SHA384, key=key@entry=0x7f89783ff943, keylen=keylen@entry=0, ptext=0x7f8972ad9430, ptext_len=ptext_len@entry=48, digest=digest@entry=0x55a79d80b948) at crypto-api.c:640 #6 0x00007f897830d2ff in _tls13_init_secret2 (prf=0x7f897848f888 <hash_algorithms+168>, psk=<optimized out>, psk@entry=0x0, psk_size=48, psk_size@entry=0, out=out@entry=0x55a79d80b948) at secrets.c:59 #7 0x00007f897830d3d0 in _tls13_init_secret (session=session@entry=0x55a79d80a1c0, psk=psk@entry=0x0, psk_size=psk_size@entry=0) at secrets.c:35 #8 0x00007f89782c66c0 in read_server_hello (datalen=<optimized out>, data=<optimized out>, session=0x55a79d80a1c0) at handshake.c:2097 #9 _gnutls_recv_handshake (session=session@entry=0x55a79d80a1c0, type=type@entry=GNUTLS_HANDSHAKE_SERVER_HELLO, optional=optional@entry=0, buf=buf@entry=0x0) at handshake.c:1656 #10 0x00007f89782c8dbb in handshake_client (session=0x55a79d80a1c0) at handshake.c:3072 #11 gnutls_handshake (session=0x55a79d80a1c0) at handshake.c:2871 #12 0x00007f89784a694f in g_tls_connection_gnutls_handshake_thread_handshake (tls=0x55a79d80c250, timeout=<optimized out>, cancellable=<optimized out>, error=0x7f8972ad9b10) at ../tls/gnutls/gtlsconnection-gnutls.c:968 #13 0x00007f89784a8942 in handshake_thread (task=0x7f8968007ec0, object=object@entry=0x55a79d80c250, task_data=task_data@entry=0x55a79d766e60, cancellable=cancellable@entry=0x55a79d748760) at ../tls/base/gtlsconnection-base.c:1564 #14 0x00007f89784a8c02 in async_handshake_thread (task=<optimized out>, object=0x55a79d80c250, task_data=0x55a79d766e60, cancellable=0x55a79d748760) at ../tls/base/gtlsconnection-base.c:1848 #15 0x00007f89882dbaf3 in g_task_thread_pool_thread (thread_data=0x7f8968007ec0, pool_data=<optimized out>) at ../gio/gtask.c:1441 #16 0x00007f8988111b72 in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib/gthreadpool.c:354 #17 0x00007f898810f172 in g_thread_proxy (data=0x55a79d7e1360) at ../glib/gthread.c:827 #18 0x00007f8987efdcc7 in start_thread (arg=<optimized out>) at pthread_create.c:442 #19 0x00007f8987f82e00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) (gdb) disassemble Dump of assembler code for function sha512_block_data_order_avx2: 0x00007f89783cbe00 <+0>: mov %rsp,%rax 0x00007f89783cbe03 <+3>: push %rbx 0x00007f89783cbe04 <+4>: push %rbp 0x00007f89783cbe05 <+5>: push %r12 0x00007f89783cbe07 <+7>: push %r13 0x00007f89783cbe09 <+9>: push %r14 0x00007f89783cbe0b <+11>: push %r15 0x00007f89783cbe0d <+13>: sub $0x520,%rsp 0x00007f89783cbe14 <+20>: shl $0x4,%rdx 0x00007f89783cbe18 <+24>: and $0xfffffffffffff800,%rsp 0x00007f89783cbe1f <+31>: lea (%rsi,%rdx,8),%rdx 0x00007f89783cbe23 <+35>: add $0x480,%rsp 0x00007f89783cbe2a <+42>: mov %rdi,0x80(%rsp) 0x00007f89783cbe32 <+50>: mov %rsi,0x88(%rsp) 0x00007f89783cbe3a <+58>: mov %rdx,0x90(%rsp) 0x00007f89783cbe42 <+66>: mov %rax,0x98(%rsp) => 0x00007f89783cbe4a <+74>: vzeroupper 0x00007f89783cbe4d <+77>: sub $0xffffffffffffff80,%rsi 0x00007f89783cbe51 <+81>: mov (%rdi),%rax 0x00007f89783cbe54 <+84>: mov %rsi,%r12 0x00007f89783cbe57 <+87>: mov 0x8(%rdi),%rbx 0x00007f89783cbe5b <+91>: cmp %rdx,%rsi 0x00007f89783cbe5e <+94>: mov 0x10(%rdi),%rcx 0x00007f89783cbe62 <+98>: cmove %rsp,%r12 0x00007f89783cbe66 <+102>: mov 0x18(%rdi),%rdx 0x00007f89783cbe6a <+106>: mov 0x20(%rdi),%r8 0x00007f89783cbe6e <+110>: mov 0x28(%rdi),%r9 0x00007f89783cbe72 <+114>: mov 0x30(%rdi),%r10 0x00007f89783cbe76 <+118>: mov 0x38(%rdi),%r11 0x00007f89783cbe7a <+122>: jmp 0x7f89783cbe80 <sha512_block_data_order_avx2+128> 0x00007f89783cbe7c <+124>: nopl 0x0(%rax)
Thank you for the report; that seems like a regression after https://gitlab.com/gnutls/gnutls/-/merge_requests/1487 where we "fixed" CPU detection. Does it only happen on QEMU? Is avx2 available in /proc/cpuinfo?
> [ 57.591535] traps: gnome-photos[2662] trap invalid opcode ip:7f516c74ba34 sp:7fff3fc5d7a8 error:0 in avx2-int8.so[7f516c74b000+2000] Actually this is unrelated to the mentioned GnuTLS change, as it (babl) has its own CPU detection code: https://gitlab.gnome.org/GNOME/babl/-/blob/1d72eaf69b906e93d0f13240835405a784996a40/extensions/avx2-int8.c#L598 So I suspect QEMU might be mis-advertising CPU features. Daniel, do you have any idea?
> $ qemu-system-x86_64.exe \ > -display gtk,show-cursor=on,grab-on-hover=on,gl=off,zoom-to-fit=off \ > --accel whpx \ > -smp 10 \ > -m 10G \ > -k en \ > -drive file=${IMAGE},if=virtio \ > -device virtio-vga \ > -device virtio-net,netdev=vmnic -netdev user,id=vmnic \ > -usbdevice tablet Given this command line, I would expect QEMU to be using 'qemu64' CPU model which has a very limited feature set and does not include AVX2. I wonder if there's some problem with the 'whpx' accelerator not correctly exposing the CPU models. I'd suggest this probably best reported to QEMU upstream, as my knowledge of QEMU's WHPX support is minimal.
With that qemu command line, the gest os supports AVX2 but not AVX. [penghuang@fedora ~]$ flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo Illegal instruction (core dumped) [penghuang@fedora ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 10 On-line CPU(s) list: 0-9 Vendor ID: AuthenticAMD Model name: AMD Ryzen 9 5950X 16-Core Processor CPU family: 15 Model: 107 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 10 Stepping: 1 BogoMIPS: 6786.89 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm ov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apic id aperfmperf pni cx16 hypervisor lahf_lm cmp_legacy sv m 3dnowprefetch vmmcall fsgsbase bmi1 avx2 smep bmi2 er ms invpcid rdseed adx smap clflushopt clwb sha_ni xsave opt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru umip vaes vpclmulqdq rdpid fsrm Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 320 KiB (10 instances) L1i: 320 KiB (10 instances) L2: 5 MiB (10 instances) L3: 640 MiB (10 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-9 Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Spec store bypass: Not affected Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling Srbds: Not affected Tsx async abort: Not affected
Looks like it is a qemu bug, qemu only advertises avx2 but not avx. However `vzeroupper` is an avx instruction. So only testing avx2 feature is not sufficient.
*** Bug 2072865 has been marked as a duplicate of this bug. ***
Hi, Prarit I think this bug should be related to the CPU features. Bug #2072865 is related to watchdog drviers, and Bug #2072865 should be a duplicate bug of #2074160. Please help to check.
(In reply to Lili Zhu from comment #7) > Hi, Prarit > > I think this bug should be related to the CPU features. Bug #2072865 is > related to watchdog drviers, and Bug #2072865 should be a duplicate bug of > #2074160. Please help to check. I'm not sure I follow how this BZ is related to watchdog BZs? Could you elaborate on why you think watchdog code is responsible for an invalid opcode? P.
Hi, Prarit 1)I found you marked this bug is a duplicate bug #2072865. I do not think this bug is a duplicate bug of bug #2072865. Bug #2072865 is related to watchdog driver. IIUC, this bug seems to be nothing to do with watchdog driver. If I am wrong, please correct me. 2) Bug #2072865 is indeed a duplicate bug, but it is a duplicate bug of Bug #2074160. Please help to check. Thanks
This is fixed in gnutls>=3.7.8: https://gitlab.com/gnutls/gnutls/-/issues/1282
This does seem to be the same issue as gnutls MR 1282. Please re-open if this still persists.