Bug 2076410 - gnome-software, flatpak and gnome-photos crash with invalid opcode
Summary: gnome-software, flatpak and gnome-photos crash with invalid opcode
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: gnutls
Version: 36
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Red Hat Crypto Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-19 01:44 UTC by Peng Huang
Modified: 2023-04-05 08:58 UTC (History)
14 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-04-05 08:58:30 UTC
Type: Bug
Embargoed:
fedora-admin-xmlrpc: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab qemu-project qemu issues 993 0 None opened Invalid opcode vzeroupper 2022-04-19 16:18:41 UTC
Red Hat Issue Tracker FC-431 0 None None None 2022-04-19 01:47:00 UTC

Description Peng Huang 2022-04-19 01:44:30 UTC
Description of problem:
flatpak, gnome-software and gnome-photos crash with invalid opcode

Version-Release number of selected component (if applicable):


How reproducible:
run `gnome-software` or `flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo`


Additional info:
BTW, I installed Fedora 36 with QEMU
QEMU version info:
QEMU emulator version 6.2.0 (v6.2.0-11889-g5b72bf03f5-dirty)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

I launch QEMU with below command line:
$ qemu-system-x86_64.exe \
  -display gtk,show-cursor=on,grab-on-hover=on,gl=off,zoom-to-fit=off \
  --accel whpx \
  -smp 10 \
  -m 10G \
  -k en \
  -drive file=${IMAGE},if=virtio \
  -device virtio-vga \
  -device virtio-net,netdev=vmnic -netdev user,id=vmnic \
  -usbdevice tablet 

Host CPU: AMD Ryzen 9 5950X
Host OS: Windows 11

Logs in Fedora 36:
[penghuang@fedora ~]$ gnome-software 
01:30:12:0969 Gs  failed to load metadata: cancelled by user action
01:30:12:0984 Gs  Only 0 apps for recent list, hiding
01:30:13:0188 Gs  ignoring unknown or empty provided item type: font
01:30:13:0315 Gs  ignoring unknown or empty provided item type: font
01:30:13:0316 Gs  ignoring unknown or empty provided item type: font
Illegal instruction (core dumped)
[penghuang@fedora ~]$ flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo
Illegal instruction (core dumped)

[penghuang@fedora ~]$ dmsg
...
[   57.591535] traps: gnome-photos[2662] trap invalid opcode ip:7f516c74ba34 sp:7fff3fc5d7a8 error:0 in avx2-int8.so[7f516c74b000+2000]
[  106.181067] traps: pool-org.gnome.[2970] trap invalid opcode ip:7fb4b4247e4a sp:7fb4837fcc80 error:0 in libgnutls.so.30.31.0[7fb4b412e000+134000]
[  106.338941] traps: pool-/usr/libex[2990] trap invalid opcode ip:7fe333f38e4a sp:7fe3316f8c80 error:0 in libgnutls.so.30.31.0[7fe333e1f000+134000]
[  118.473914] traps: pool-/usr/libex[3290] trap invalid opcode ip:7fcd218f3e4a sp:7fcd110aac80 error:0 in libgnutls.so.30.31.0[7fcd217da000+134000]
[  118.481087] traps: pool-org.gnome.[3273] trap invalid opcode ip:7f0eac5a2e4a sp:7f0ea68afc80 error:0 in libgnutls.so.30.31.0[7f0eac489000+134000]
[  135.696232] traps: pool-flatpak re[3638] trap invalid opcode ip:7f78450cde4a sp:7f7837ffdc80 error:0 in libgnutls.so.30.31.0[7f7844fb4000+134000]
[  174.225046] traps: gnome-photos[3742] trap invalid opcode ip:7fcecd7b6a34 sp:7ffd936d1d28 error:0 in avx2-int8.so[7fcecd7b6000+2000]
[  174.871310] traps: gnome-photos[3871] trap invalid opcode ip:7f8f3c25ba4c sp:7ffdc26b1028 error:0 in avx2-int8.so[7f8f3c25b000+2000]

Crash stack from flatpak:

Downloading separate debug info for /lib64/liblzma.so.5...
Downloading separate debug info for /home/penghuang/Sources/system-supplied DSO at 0x7fff30f55000...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007f89783cbe4a in sha512_block_data_order_avx2 () from /lib64/libgnutls.so.30
[Current thread is 1 (Thread 0x7f8972ada640 (LWP 5083))]
(gdb) bt
#0  0x00007f89783cbe4a in sha512_block_data_order_avx2 () from /lib64/libgnutls.so.30
#1  0x00007f89783bf042 in x86_sha512_update (ctx=0x7f8972ad9090, length=128, data=0x7f8972ad8f90 '\\' <repeats 128 times>, "@\255")
    at sha-x86-ssse3.c:215
#2  0x00007f897810879b in nettle_hmac_set_key (outer=<optimized out>, inner=0x7f8972ad9168, state=<optimized out>, 
    hash=0x7f897848b6c0 <x86_sha384>, key_length=0, key=0x7f89783ff943 "") at /usr/src/debug/nettle-3.7.3-3.fc36.x86_64/hmac.c:83
#3  0x00007f89783bce3a in wrap_x86_hmac_fast (algo=<optimized out>, nonce=<optimized out>, nonce_size=<optimized out>, key=0x7f89783ff943, 
    key_size=0, text=0x7f8972ad9430, text_size=48, digest=0x55a79d80b948) at hmac-x86-ssse3.c:294
#4  0x00007f89782d4b57 in _gnutls_mac_fast (algorithm=GNUTLS_MAC_SHA384, key=0x7f89783ff943, keylen=0, text=0x7f8972ad9430, textlen=48, 
    digest=0x55a79d80b948) at hash_int.c:167
#5  0x00007f89782f524d in gnutls_hmac_fast (algorithm=GNUTLS_MAC_SHA384, key=key@entry=0x7f89783ff943, keylen=keylen@entry=0, 
    ptext=0x7f8972ad9430, ptext_len=ptext_len@entry=48, digest=digest@entry=0x55a79d80b948) at crypto-api.c:640
#6  0x00007f897830d2ff in _tls13_init_secret2 (prf=0x7f897848f888 <hash_algorithms+168>, psk=<optimized out>, psk@entry=0x0, psk_size=48, 
    psk_size@entry=0, out=out@entry=0x55a79d80b948) at secrets.c:59
#7  0x00007f897830d3d0 in _tls13_init_secret (session=session@entry=0x55a79d80a1c0, psk=psk@entry=0x0, psk_size=psk_size@entry=0) at secrets.c:35
#8  0x00007f89782c66c0 in read_server_hello (datalen=<optimized out>, data=<optimized out>, session=0x55a79d80a1c0) at handshake.c:2097
#9  _gnutls_recv_handshake (session=session@entry=0x55a79d80a1c0, type=type@entry=GNUTLS_HANDSHAKE_SERVER_HELLO, optional=optional@entry=0, 
    buf=buf@entry=0x0) at handshake.c:1656
#10 0x00007f89782c8dbb in handshake_client (session=0x55a79d80a1c0) at handshake.c:3072
#11 gnutls_handshake (session=0x55a79d80a1c0) at handshake.c:2871
#12 0x00007f89784a694f in g_tls_connection_gnutls_handshake_thread_handshake (tls=0x55a79d80c250, timeout=<optimized out>, 
    cancellable=<optimized out>, error=0x7f8972ad9b10) at ../tls/gnutls/gtlsconnection-gnutls.c:968
#13 0x00007f89784a8942 in handshake_thread (task=0x7f8968007ec0, object=object@entry=0x55a79d80c250, task_data=task_data@entry=0x55a79d766e60, 
    cancellable=cancellable@entry=0x55a79d748760) at ../tls/base/gtlsconnection-base.c:1564
#14 0x00007f89784a8c02 in async_handshake_thread (task=<optimized out>, object=0x55a79d80c250, task_data=0x55a79d766e60, 
    cancellable=0x55a79d748760) at ../tls/base/gtlsconnection-base.c:1848
#15 0x00007f89882dbaf3 in g_task_thread_pool_thread (thread_data=0x7f8968007ec0, pool_data=<optimized out>) at ../gio/gtask.c:1441
#16 0x00007f8988111b72 in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib/gthreadpool.c:354
#17 0x00007f898810f172 in g_thread_proxy (data=0x55a79d7e1360) at ../glib/gthread.c:827
#18 0x00007f8987efdcc7 in start_thread (arg=<optimized out>) at pthread_create.c:442
#19 0x00007f8987f82e00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)
(gdb) disassemble 
Dump of assembler code for function sha512_block_data_order_avx2:
   0x00007f89783cbe00 <+0>:    mov    %rsp,%rax
   0x00007f89783cbe03 <+3>:    push   %rbx
   0x00007f89783cbe04 <+4>:    push   %rbp
   0x00007f89783cbe05 <+5>:    push   %r12
   0x00007f89783cbe07 <+7>:    push   %r13
   0x00007f89783cbe09 <+9>:    push   %r14
   0x00007f89783cbe0b <+11>:    push   %r15
   0x00007f89783cbe0d <+13>:    sub    $0x520,%rsp
   0x00007f89783cbe14 <+20>:    shl    $0x4,%rdx
   0x00007f89783cbe18 <+24>:    and    $0xfffffffffffff800,%rsp
   0x00007f89783cbe1f <+31>:    lea    (%rsi,%rdx,8),%rdx
   0x00007f89783cbe23 <+35>:    add    $0x480,%rsp
   0x00007f89783cbe2a <+42>:    mov    %rdi,0x80(%rsp)
   0x00007f89783cbe32 <+50>:    mov    %rsi,0x88(%rsp)
   0x00007f89783cbe3a <+58>:    mov    %rdx,0x90(%rsp)
   0x00007f89783cbe42 <+66>:    mov    %rax,0x98(%rsp)
=> 0x00007f89783cbe4a <+74>:    vzeroupper 
   0x00007f89783cbe4d <+77>:    sub    $0xffffffffffffff80,%rsi
   0x00007f89783cbe51 <+81>:    mov    (%rdi),%rax
   0x00007f89783cbe54 <+84>:    mov    %rsi,%r12
   0x00007f89783cbe57 <+87>:    mov    0x8(%rdi),%rbx
   0x00007f89783cbe5b <+91>:    cmp    %rdx,%rsi
   0x00007f89783cbe5e <+94>:    mov    0x10(%rdi),%rcx
   0x00007f89783cbe62 <+98>:    cmove  %rsp,%r12
   0x00007f89783cbe66 <+102>:    mov    0x18(%rdi),%rdx
   0x00007f89783cbe6a <+106>:    mov    0x20(%rdi),%r8
   0x00007f89783cbe6e <+110>:    mov    0x28(%rdi),%r9
   0x00007f89783cbe72 <+114>:    mov    0x30(%rdi),%r10
   0x00007f89783cbe76 <+118>:    mov    0x38(%rdi),%r11
   0x00007f89783cbe7a <+122>:    jmp    0x7f89783cbe80 <sha512_block_data_order_avx2+128>
   0x00007f89783cbe7c <+124>:    nopl   0x0(%rax)

Comment 1 Daiki Ueno 2022-04-19 06:00:07 UTC
Thank you for the report; that seems like a regression after https://gitlab.com/gnutls/gnutls/-/merge_requests/1487 where we "fixed" CPU detection. Does it only happen on QEMU? Is avx2 available in /proc/cpuinfo?

Comment 2 Daiki Ueno 2022-04-19 09:04:27 UTC
> [   57.591535] traps: gnome-photos[2662] trap invalid opcode ip:7f516c74ba34 sp:7fff3fc5d7a8 error:0 in avx2-int8.so[7f516c74b000+2000]

Actually this is unrelated to the mentioned GnuTLS change, as it (babl) has its own CPU detection code:
https://gitlab.gnome.org/GNOME/babl/-/blob/1d72eaf69b906e93d0f13240835405a784996a40/extensions/avx2-int8.c#L598


So I suspect QEMU might be mis-advertising CPU features. Daniel, do you have any idea?

Comment 3 Daniel Berrangé 2022-04-19 11:02:55 UTC
> $ qemu-system-x86_64.exe \
>   -display gtk,show-cursor=on,grab-on-hover=on,gl=off,zoom-to-fit=off \
>   --accel whpx \
>   -smp 10 \
>   -m 10G \
>   -k en \
>   -drive file=${IMAGE},if=virtio \
>   -device virtio-vga \
>   -device virtio-net,netdev=vmnic -netdev user,id=vmnic \
>   -usbdevice tablet 

Given this command line, I would expect QEMU to be using 'qemu64' CPU model which has a very limited feature set and does not include AVX2.

I wonder if there's some problem with the 'whpx' accelerator not correctly exposing the CPU models.

I'd suggest this probably best reported to QEMU upstream, as my knowledge of QEMU's WHPX support is minimal.

Comment 4 Peng Huang 2022-04-19 15:40:12 UTC
With that qemu command line, the gest os supports AVX2 but not AVX.

[penghuang@fedora ~]$ flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
Illegal instruction (core dumped)
[penghuang@fedora ~]$ lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  10
  On-line CPU(s) list:   0-9
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 5950X 16-Core Processor
    CPU family:          15
    Model:               107
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           10
    Stepping:            1
    BogoMIPS:            6786.89
    Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm
                         ov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm
                          constant_tsc rep_good nopl nonstop_tsc cpuid extd_apic
                         id aperfmperf pni cx16 hypervisor lahf_lm cmp_legacy sv
                         m 3dnowprefetch vmmcall fsgsbase bmi1 avx2 smep bmi2 er
                         ms invpcid rdseed adx smap clflushopt clwb sha_ni xsave
                         opt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru umip 
                         vaes vpclmulqdq rdpid fsrm
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   320 KiB (10 instances)
  L1i:                   320 KiB (10 instances)
  L2:                    5 MiB (10 instances)
  L3:                    640 MiB (10 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-9
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Not affected
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Comment 5 Peng Huang 2022-04-19 16:11:58 UTC
Looks like it is a qemu bug, qemu only advertises avx2 but not avx. However `vzeroupper` is an avx instruction. So only testing avx2 feature
is not sufficient.

Comment 6 Prarit Bhargava 2022-06-03 17:39:52 UTC
*** Bug 2072865 has been marked as a duplicate of this bug. ***

Comment 7 Lili Zhu 2022-06-05 10:14:54 UTC
Hi, Prarit

I think this bug should be related to the CPU features. Bug #2072865 is related to watchdog drviers, and Bug #2072865 should be a duplicate bug of #2074160. Please help to check.

Comment 8 Prarit Bhargava 2022-06-06 13:55:57 UTC
(In reply to Lili Zhu from comment #7)
> Hi, Prarit
> 
> I think this bug should be related to the CPU features. Bug #2072865 is
> related to watchdog drviers, and Bug #2072865 should be a duplicate bug of
> #2074160. Please help to check.

I'm not sure I follow how this BZ is related to watchdog BZs?  Could you elaborate on why you think watchdog code is responsible for an invalid opcode?

P.

Comment 9 Lili Zhu 2022-06-07 01:53:54 UTC
Hi, Prarit

1)I found you marked this bug is a duplicate bug #2072865. I do not think this bug is a duplicate bug of bug #2072865. Bug #2072865 is related to watchdog driver. IIUC, this bug seems to be nothing to do with watchdog driver. If I am wrong, please correct me.

2) Bug #2072865 is indeed a duplicate bug, but it is a duplicate bug of Bug #2074160. 

Please help to check. Thanks

Comment 10 Michal Schmidt 2023-04-04 06:33:55 UTC
This is fixed in gnutls>=3.7.8: https://gitlab.com/gnutls/gnutls/-/issues/1282

Comment 11 Clemens Lang 2023-04-05 08:58:30 UTC
This does seem to be the same issue as gnutls MR 1282. Please re-open if this still persists.


Note You need to log in before you can comment on or make changes to this bug.