Bug 2116363
| Summary: | mutter native backend fails at start up if no monitor attached | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Peter Kopec <pekopec> | ||||||||||
| Component: | mutter | Assignee: | Jonas Ådahl <jadahl> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Peter Kopec <pekopec> | ||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 9.1 | CC: | ayadav, fmuellner, hdegoede, mdaenzer, ndegraef, rstrode, tpelka | ||||||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | gnome-shell-40.10-10.el9 mutter-40.9-14.el9 | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2023-05-09 07:44:43 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Deadline: | 2023-02-06 | ||||||||||||
| Attachments: |
|
||||||||||||
Journal shows Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found So I guess mutter fails if it can't find a video card (GPU) with a monitor (output) attached. Created attachment 1928743 [details]
rhel9.2 journal
still reproducible on rhel 9.2. Attaching new journal conf.
Created attachment 1928744 [details]
rhel 8.8 journal
Adding also log from RHEL 8.8 with no issue.
> Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found
This means there are no GPUs with connectors at all, not that there are no monitors connected. Problem is probably that we start before drm/kms has fully loaded. There seems to be a device (nouveau at /dev/dri/card0) but it reports itself as completely headless.
Ah, indeed, and I missed originally that comment 0 says booting into X and back out makes wayland start working, so a monitor must be attached! But we don't start gnome-shell until systemd says the system has CanGraphical. That gets set when udev has master-of-seat tagged on the card device from this udev rule: SUBSYSTEM=="drm", KERNEL=="card[0-9]*", TAG+="seat", TAG+="master-of-seat" So basically the presence of the card node is how we know we're good to start. Probably the nouveau driver needs to do an initial coldplug of connectors before setting up /dev/dri/card0 or I guess systemd could change its udev rule to only give master-of-seat if the system has /dev/dri/card[0-9][^-]* and /dev/dri/card[0-9]* Let's move to graphics team and get their take. Hmm, the code has this comment, https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_connector.c /** * drm_connector_register - register a connector * @connector: the connector to register * * Register userspace interfaces for a connector. Only call this for connectors * which can be hotplugged after drm_dev_register() has been called already, * e.g. DP MST connectors. All other connectors will be registered automatically * when calling drm_dev_register(). So i'm assuming drm_dev_register creates /dev/dri/card0 and drm_connector_register creates /dev/dri/card0-WHATEVERCONNECTOR The comment says drm_dev_register will create the connectors it can up front. So the kernel may be doing all it can already? Are there GPUs that only have hotpluggable connectors? (maybe cuda things?) is that what we're seeing here? Hopefully graphics team has insight. I just took a quick look over the log to see if anything popped out: ╎❯ cat journal_noscreenboot | grep -E 'Load Kernel Module drm|modesetting|dri.card0|fb0|GPU|crtc|utput|EDID' | sed 's/^/> /' > Aug 08 13:11:52 testpcR9 systemd[1]: Starting Load Kernel Module drm... > Aug 08 13:11:52 testpcR9 systemd[1]: Finished Load Kernel Module drm. > Aug 08 13:11:52 testpcR9 kernel: [drm] amdgpu kernel modesetting enabled. > Aug 08 13:11:52 testpcR9 kernel: [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1043:0x876B 0xC6). amdgpu gets loaded. > Aug 08 13:11:53 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes The card can't find a controller? That's weird. > Aug 08 13:11:53 testpcR9 systemd[1]: Starting Load Kernel Module drm... > Aug 08 13:11:53 testpcR9 systemd[1]: Finished Load Kernel Module drm. modprobed again. > Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Adding device '/dev/dri/card0' (amdgpu) using atomic mode setting. 4 seconds later, systemd thinks the system is good to go and GDM has started a native gnome-shell > Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found But it fails. Maybe because there's no connectors, maybe because of the missing crtc error above? > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (==) Automatically adding GPU devices > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (==) Automatically binding GPU devices > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) xfree86: Adding drm device (/dev/dri/card0) So we fall back to Xorg > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) systemd-logind: got fd for /dev/dri/card0 226:0 fd 12 paused 0 > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (==) Matched modesetting as autoconfigured driver 1 > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) LoadModule: "modesetting" > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) Module modesetting: vendor="X.Org Foundation" > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modesetting: Driver for Modesetting Kernel Drivers: kms > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): using drv /dev/dri/card0 > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output DP-1 has no monitor section > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output HDMI-1 has no monitor section > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID for output DP-1 > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID for output HDMI-1 > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output DP-1 disconnected > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output HDMI-1 disconnected > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (WW) modeset(0): No outputs definitely connected, trying again... > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output DP-1 disconnected > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output HDMI-1 disconnected > Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (WW) modeset(0): Unable to find connected outputs - setting 1024x768 initial framebuffer It sees connectors (now?) but no monitors attached, so it pretends like there's a least common denominator monitor. > Aug 08 13:14:34 testpcR9 kernel: fbcon: amdgpudrmfb (fb0) is primary device > Aug 08 13:14:34 testpcR9 kernel: Console: switching to colour frame buffer device 480x135 > Aug 08 13:14:34 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] fb0: amdgpudrmfb frame buffer device Uh, fbcon is just now registered 3 minutes later? > Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID vendor "BNQ", prod id 32816 > Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Using EDID range info for horizontal sync > Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Using EDID range info for vertical refresh > Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Allocate new frame buffer 3840x2160 stride > Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID vendor "BNQ", prod id 32816 And now the monitor shows up. So it's taking a really long time for things to initialize. Even if the kernel and/or systemd were changed to defer CanGraphical until things were good, GDM would time out and fall back to X. I think maybe the root problem is the "Cannot find any crtc or sizes" at the top but not sure. Again, let's wait to see what kernel graphics team says. (In reply to Ray Strode [halfline] from comment #7) > > > Aug 08 13:11:53 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes > > The card can't find a controller? That's weird. I think this just means that no connectors were detected as connected. Anyway, this is about fbcon, it shouldn't directly affect mutter. > > Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found > > But it fails. Maybe because there's no connectors, maybe because of the > missing crtc error above? Looking at the mutter code, this message is printed (and mutter bails) even if there are GPUs with connectors, if none of them are detected as connected. Which is consistent with reproduction steps 2 & 3. > > Aug 08 13:14:34 testpcR9 kernel: fbcon: amdgpudrmfb (fb0) is primary device > > Aug 08 13:14:34 testpcR9 kernel: Console: switching to colour frame buffer device 480x135 > > Aug 08 13:14:34 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] fb0: amdgpudrmfb frame buffer device > > Uh, fbcon is just now registered 3 minutes later? This is presumably when a monitor is connected (reproduction step 4). > Looking at the mutter code, this message is printed (and mutter bails) even if there are GPUs with connectors, if none of them are detected as connected. Which is consistent with reproduction steps 2 & 3. Indeed, this seems to have changed in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/525, which arrived with rhel9, without a motivation, so looks like a mistake - mutter should handle launching without any connected monitors. https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2821 changes back to the old behavior and makes the test case actually test the mode setting/native backend paths. Created a scratch build: https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1876905 Mind testing before this is pushed to c9s? I haven't tested on actual hardware. I have tried to install scratch mutter-40.9-14.el9.x86_64, I dont see "Failed to create backend" in the journalctl but wayland is still not available right after connecting the monitor. (In reply to Peter Kopec from comment #12) > I have tried to install scratch mutter-40.9-14.el9.x86_64, I dont see > "Failed to create backend" in the journalctl but wayland is still not > available right after connecting the monitor. Can you run with `MUTTER_DEBUG=kms` in /etc/environment and attach a journal? Either way, I'll try to imitate a similar hardware setup by masking the laptop screen. Created attachment 1941558 [details]
journal scratch debug
attaching the journal log with debug option
(In reply to Michel Dänzer from comment #8) > Looking at the mutter code, this message is printed (and mutter bails) even > if there are GPUs with connectors, if none of them are detected as > connected. Which is consistent with reproduction steps 2 & 3. Ah okay, thanks for getting to the bottom of this. I have noticed that with both, current and scratch version i can have wayland available but only when i time connecting the monitor shortly after gdm shows up, booting animation is already done. (In reply to Peter Kopec from comment #14) > Created attachment 1941558 [details] > journal scratch debug > > attaching the journal log with debug option Can you try these two instead: https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904127 https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904164 (In reply to Jonas Ådahl from comment #17) > Can you try these two instead: > > https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904127 > https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904164 With these packages wayland is available after connecting the monitor. with gnome-shell-40.10-10.el9 and mutter-40.9-14.el9 wayland is available when connecting monitor after system boots into gdm tested with navi 33 and TU116 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (mutter bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2299 |
Created attachment 1904264 [details] journalctl Description of problem: After no screen boot wayland session is not available. Only X, when logged into X and logged out wayland is available. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. reboot machine 2. disconnect displays during shutdown 3. wait for boot to finish 4. connect a display Additional info: Tested on AMD Raven Ridge and RTX 5000 [TU104] Navi 24 and GA106 failed show any image.