Description of problem: We have a group of systems that have an unrelated rpm installed (ocsinventory) and in the process of doing its tasks, the command /usr/sbin/monitor-get-edid-using-vbe is executed. This results in a segfault and (90%+ of the time) complete system freeze. This is not widespread, we have a fairly large cluster of machines and this issue is specific to a group of servers that are all identical. Feels like a bad interaction with the specific hardware in these systems but I'm not sure what to provide for details. For starters here's the motherboard: Manufacturer: Supermicro Product Name: X10DRU-i+ Version: 1.02B These systems have only onboard vga, and no monitors are plugged in. If there's any other information I should provide, please let me know. Version-Release number of selected component (if applicable): monitor-edid-3.4-1.el9.x86_64 How reproducible: Core dump happens every single time, system freeze more than 90% of the time. Steps to Reproduce: 1. execute /usr/sbin/monitor-get-edid-using-vbe or /usr/sbin/monitor-get-edid Actual results: Core dump. Expected results: No core dump. Additional info: # coredumpctl info 10823 PID: 10823 (monitor-get-edi) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Tue 2023-05-02 15:43:14 CDT (16s ago) Command Line: /usr/sbin/monitor-get-edid-using-vbe Executable: /usr/sbin/monitor-get-edid-using-vbe Control Group: /user.slice/user-0.slice/session-1.scope Unit: session-1.scope Slice: user-0.slice Session: 1 Owner UID: 0 (root) Boot ID: ffcfcdeeac40417aaa303f1e8ce853e1 Machine ID: XXXXXXX Hostname: XXXXXXX Storage: /var/lib/systemd/coredump/core.monitor-get-edi.0.ffcfcdeeac40417aaa303f1e8ce853e1.10823.1683060194000000.zst (present) Disk Size: 22.3K Message: Process 10823 (monitor-get-edi) of user 0 dumped core. Module linux-vdso.so.1 with build-id a4e04c10e1030c2f0e0c64dcebc0c1d16acdb19a Module ld-linux-x86-64.so.2 with build-id df9c6b298bf5e3c1d0eb6a0911f3f561908a704d Module libc.so.6 with build-id 82f7ae28e16376aa97cc3bf50b40ab2d1043924a Module libx86.so.1 with build-id 9695028d4cf3b0c757d0866cb441b126007e5545 Module monitor-get-edid-using-vbe with build-id e4af22ea31e0fff92136cf14ea852eb78b3dba9b Stack trace of thread 10823: #0 0x00007f24caa510a8 x_outw (libx86.so.1 + 0x80a8) #1 0x00007f24caa5a91d x86emuOp_out_word_DX_AX.lto_priv.0 (libx86.so.1 + 0x1191d) #2 0x00007f24caa58d97 X86EMU_exec (libx86.so.1 + 0xfd97) #3 0x00007f24caa58ecc real_call (libx86.so.1 + 0xfecc) #4 0x0000564e996cde10 get_edid (monitor-get-edid-using-vbe + 0x1e10) #5 0x0000564e996cd571 main (monitor-get-edid-using-vbe + 0x1571) #6 0x00007f24ca87feb0 __libc_start_call_main (libc.so.6 + 0x3feb0) #7 0x00007f24ca87ff60 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3ff60) #8 0x0000564e996cd735 _start (monitor-get-edid-using-vbe + 0x1735) ELF object binary architecture: AMD x86-64
Hi Seth, can you email me a core dump?
Taking the core apart: Core was generated by `/usr/sbin/monitor-get-edid-using-vbe'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f6d9e8b20a8 in x_outw () from /lib64/libx86.so.1 I have no idea how to fix that...
Sigh, wrong paste: Core was generated by `/usr/sbin/monitor-get-edid-using-vbe'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f6d9e8b20a8 in outw (__port=65535, __value=980) at /usr/include/sys/io.h:111 111 __asm__ __volatile__ ("outw %w0,%w1": :"a" (__value), "Nd" (__port)); (gdb) info frame Stack level 0, frame at 0x7ffcdefcd2d0: rip = 0x7f6d9e8b20a8 in outw (/usr/include/sys/io.h:111); saved rip = 0x7f6d9e8bb91d inlined into frame 1 source language c. Arglist at unknown address. Locals at unknown address, Previous frame's sp in rsp (gdb) info args __port = 65535 __value = 980 (gdb) info locals No locals.