This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2110557 - Monitor EDID tools cause core dump and system crash on systems with integrated ASPEED video
Summary: Monitor EDID tools cause core dump and system crash on systems with integrate...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: CentOS Stream
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-25 15:51 UTC by Larry Mills
Modified: 2023-09-11 21:46 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-11 21:46:35 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
dmidecode output (21.11 KB, text/plain)
2022-07-25 15:51 UTC, Larry Mills
no flags Details
failing system SOS report (11.73 MB, application/x-xz)
2022-08-08 15:11 UTC, Larry Mills
no flags Details
strace output from system using ast2400 video (4.74 KB, text/plain)
2023-01-06 21:20 UTC, Larry Mills
no flags Details
strace output from system with AST2600 video (4.76 KB, text/plain)
2023-01-06 21:23 UTC, Larry Mills
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-3173 0 None Migrated None 2023-09-11 21:46:32 UTC
Red Hat Issue Tracker RHELPLAN-128917 0 None None None 2022-07-25 16:21:50 UTC

Description Larry Mills 2022-07-25 15:51:06 UTC
Created attachment 1899198 [details]
dmidecode output

Description of problem:

On an older Supermicro platform, board model X10SRi-F, execution of the the tools "monitor-edid" or "monitor-get-edid-using-vbe" cause a core dump error in syslog, followed shortly thereafter by a system crash that requires a hard-reset to bring the system back up.

The syslog message is:

Jul 21 00:12:01 localhost kernel: traps: monitor-get-edi[129522] general protection fault ip:7f2312e9e0a8 sp:7fffae140ff8 error:0 in libx86.so.1[7f2312e9d000+14000]
Jul 21 00:12:01 localhost systemd[1]: Created slice Slice /system/systemd-coredump.
Jul 21 00:12:01 localhost systemd[1]: Started Process Core Dump (PID 129523/UID 0).
Jul 21 00:12:01 localhost systemd-coredump[129524]: Resource limits disable core dumping for process 129522 (monitor-get-edi).
Jul 21 00:12:01 localhost systemd-coredump[129524]: Process 129522 (monitor-get-edi) of user 0 dumped core.


Shortly after the core dump message is generated, the system will lock up hard, no other errors are generated.

Unfortunately, this error is unavoidable in my environment, as the "monitor-edid" tools are called routinely by the "OCS Inventory" toolset that is used to catalog system contents.

I only encounter this error (so far) on the Supermicro X10 hardware.  I have HP and Dell hardware running the same OS version that do not exhibit the problem.




Version-Release number of selected component (if applicable):

Kernel: 5.14.0-130.el9.x86_64
EDID version:  monitor-edid-3.4-1.el9.x86_64
Supermicro X10 BIOS version 3.4 (latest as of 7/22)

How reproducible:

100% reproducible on every execution of problem commands.



Steps to Reproduce:
1. On a Supermicro X10 platform with only integrated video, execute either command
/usr/sbin/monitor-edid
/usr/sbin/monitor-get-edid-using-vbe


2. Observe core dump message in syslog

3. System locks up and require hard reset to restore to operation

Actual results:

Core dump and system lockup

Expected results:

No errors or system interruption

Additional info:

Output from dmidecode is attached.

Comment 1 Niels De Graef 2022-08-05 14:31:49 UTC
Hi Larry, could you attach a sosreport from a system that had such an error? There's not much to go on right now, so we'll need this as a first step for our investgation

Comment 2 Larry Mills 2022-08-08 15:11:40 UTC
Created attachment 1904315 [details]
failing system SOS report

Comment 3 Larry Mills 2022-08-08 15:12:55 UTC
Hello Niels,

A sos report has been attached to this case.  Let me know if you need any additional information.

Comment 4 Larry Mills 2022-11-03 19:56:50 UTC
I have found a second system type that has this same core dump issue with the monitor-edid tools. The only difference in behavior is that the system does not crash, but the monitor-edid utility still core dumps in the same manner.   The second system type is a brand new ASUS "KMPP-D32 Series" board that uses an integrated video controller.

Looking at the two systems that experience this problem, they have the same OS configuration (although the kernel is now at 5.14.0-183.el9.x86_64) but the common thread now appears to be the integrated video controller.   Here is the "lshw" output of the display properties for both of the systems that experience the problem:

ASUS motherboard  "KMPP-D32 Series"

 *-display
      description: VGA compatible controller
      product: ASPEED Graphics Family
      vendor: ASPEED Technology, Inc.
      physical id: 0
      bus info: pci@0000:62:00.0
      logical name: /dev/fb0
      version: 52
      width: 32 bits
      clock: 33MHz
      capabilities: pm msi vga_controller bus_master cap_list rom fb
      configuration: depth=32 driver=ast latency=0 resolution=1024,768
      resources: irq:158 memory:c8000000-cbffffff memory:cc000000-cc03ffff ioport:8000(size=128) memory:c0000-dffff



Supermicro X10 

*-display
     description: VGA compatible controller
     product: ASPEED Graphics Family
     vendor: ASPEED Technology, Inc.
     physical id: 0
     bus info: pci@0000:07:00.0
     logical name: /dev/fb0
     version: 30
     width: 32 bits
     clock: 33MHz
     capabilities: pm msi vga_controller cap_list rom fb
     configuration: depth=32 driver=ast latency=0 resolution=1024,768
     resources: irq:18 memory:fa000000-faffffff memory:fb000000-fb01ffff ioport:d000(size=128) memory:c0000-dffff

Comment 5 Larry Mills 2022-11-08 20:53:47 UTC
One other detail to add - both of the the systems that the problem occurs on normally run without any physical monitor attached.  I noticed that if I boot the system with a physical monitor attached, "monitor-edid" still core dumps, but it does generate seemingly accurate monitor status before it crashes.   This only occurs if a monitor is attached at system boot time, attaching a monitor while the system is running results in a core dump with no output.

Comment 6 Jocelyn Falempe 2023-01-06 17:04:08 UTC
Hi, 

I tried to reproduce the issue on a server with Aspeed graphics (AST2600).
I only have remote access, and it looks there are nothing connected to VGA.

running monitor-edid returns nothing and it doesn't crash.
I tested with 5.14.0-226.el9.x86_64, I will try next week with an older kernel to see if I can reproduce something.

Normally monitor-edid, only read the edid data in /sys/class/drm/card0/card0-VGA-1/edid

so can you try if only doing this will crash your machine ?
cat /sys/class/drm/card0/card0-VGA-1/edid > /tmp/edid.bin

Comment 7 Larry Mills 2023-01-06 21:16:52 UTC
Hello,

I updated my systems to kernel 5.14.0-226.el9.x86_64, and the behavior of the problem has indeed changed a bit, but is consistent across both the two hardware types I have seen the problem on.


The problem behavior now on kernel 5.14.0.-226.el9.x86_64:

- cat /sys/class/drm/card0/card0-VGA-1/edid
  No errors or crashes, returns normally

- /usr/sbin/monitor-edid
  On both the older (AST2400, Supermicro X10 board) and newer (AST2600, Asus KMPP-D32 board) this command now returns normally - with no output.  No core dump or system freeze.

- /usr/sbin/monitor-get-edid-using-vbe
  On the older system (AST2400, Supermicro X10 board):  core dump followed shortly by system freeze.
  
  On the newer system (AST2600, Asus KMPP-D32 board):  core dump, but no system freeze.


I have attached "strace" output from the monitor-get-edid-using-vbe command for both systems.

Thanks,

Larry

Comment 8 Larry Mills 2023-01-06 21:20:57 UTC
Created attachment 1936303 [details]
strace output from system using ast2400 video

Comment 9 Larry Mills 2023-01-06 21:23:28 UTC
Created attachment 1936304 [details]
strace output from system with AST2600 video

Comment 10 Jocelyn Falempe 2023-01-06 22:32:13 UTC
ok, thanks for your reactivity.

I just read about vbe and EDID.

It looks like vbe is only available when booting in BIOS mode, it cannot work with UEFI.
https://wiki.osdev.org/Getting_VBE_Mode_Info

The strace shows that it tries to directly access /dev/mem, and looking for VBE address.
But if you booted in UEFI mode, this memory is used for something else, and that's probably why it crashes.

If you can confirm the machine are booting in UEFI mode, that can explain the crash.

Comment 11 Larry Mills 2023-01-06 22:45:54 UTC
Both systems that have the problem are definitely booted in BIOS mode, which is the default mode for both servers.

Comment 12 Jocelyn Falempe 2023-01-09 09:41:20 UTC
I've made a few try, but the server I use boot in UEFI mode, and I can't change it back to BIOS.

#monitor-get-edid-using-vbe -v
VBE: Error (0x4f00): 0x4f00

I get this error, but it doesn't crash the machine.

Anyway, monitor-get-edid-using-vbe doesn't use the aspeed kernel driver. It accesses directly the hardware by using /dev/mem, and calling some BIOS interrupt.
So in this case the error is either a bug in monitor-get-edid-using-vbe, or a buggy BIOS implementation. In both cases I won't be able to help much.

Other have reported hard freeze in UEFI mode, in the monitor-get-edid bug tracker:
https://bugs.mageia.org/show_bug.cgi?id=28124

If you agree, we can close this bug, as monitor-edid is now working, and monitor-get-edid-using-vbe is not using the aspeed kernel driver.

Comment 13 Larry Mills 2023-01-09 16:03:55 UTC
I would prefer to keep this bug open, as it affects multiple vendors and hardware platforms.  If there is any additional information I can gather that would assist you in troubleshooting the issue, please let me know.

Comment 14 Larry Mills 2023-01-12 22:13:56 UTC
A third system type that the problem occurs on is: Supermicro X11SPW-CTF, which uses an ASPEED AST-2500 integrated video controller.

This system operates in BIOS boot mode, and is using the latest BIOS version from the OEM, version 3.8a, 10/22/2022.

Executing /usr/sbin/monitor-get-edid-using-vbe produces a core dump, but no system freeze.

Comment 15 Jocelyn Falempe 2023-01-13 07:40:49 UTC
Maybe one thing you can try, is to blacklist the ast kernel driver.

The kernel will fallback to vesafb, so you will still have graphics.
The drawback is that you won't be able to change the resolution.

Normally vesafb won't change the cards settings, thus the BIOS calls used by monitor-get-edid-using-vbe might still work.

Comment 16 Larry Mills 2023-01-13 15:47:50 UTC
I did try blacklisting the ast driver on one system, but that resulted in no video driver at all from loading:

lshw shows no video hardware or driver
/sys/class/drm/card0  - path not present


But yes, with no video driver loaded, the "monitor-get-edid-using-vbe" does not coredump :)

Comment 17 Jocelyn Falempe 2023-01-16 10:33:04 UTC
You still have a video driver, it's not a "DRM" driver but a "FB" (older kernel interface for graphic cards).

you should have a /dev/fb0 in this case.

So I think the BIOS is not buggy after all.

The root cause is that the ast driver configures the card for its own usage, so the configuration and memory mapping from the BIOS is no longer valid.

VBE interface is normally intended for bootloader, or system fallback drivers (like vesafb). It should not be used by userspace program like this.

Comment 18 RHEL Program Management 2023-09-11 21:46:16 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 19 RHEL Program Management 2023-09-11 21:46:35 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.