RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2116363 - mutter native backend fails at start up if no monitor attached
Summary: mutter native backend fails at start up if no monitor attached
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2023-02-06
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: mutter
Version: 9.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jonas Ådahl
QA Contact: Peter Kopec
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-08 11:46 UTC by Peter Kopec
Modified: 2023-05-09 08:58 UTC (History)
7 users (show)

Fixed In Version: gnome-shell-40.10-10.el9 mutter-40.9-14.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:44:43 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
journalctl (241.56 KB, text/plain)
2022-08-08 11:46 UTC, Peter Kopec
no flags Details
rhel9.2 journal (240.81 KB, text/plain)
2022-11-30 13:51 UTC, Peter Kopec
no flags Details
rhel 8.8 journal (172.75 KB, text/plain)
2022-11-30 13:51 UTC, Peter Kopec
no flags Details
journal scratch debug (244.51 KB, text/plain)
2023-02-01 17:13 UTC, Peter Kopec
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Gitlab GNOME mutter merge_requests 2821 0 None opened gpu/kms: Report that we can have outputs if we have connectors (& tests) 2023-02-01 11:29:11 UTC
Red Hat Issue Tracker RHELPLAN-130483 0 None None None 2022-08-08 11:54:48 UTC
Red Hat Product Errata RHBA-2023:2299 0 None None None 2023-05-09 07:44:52 UTC

Description Peter Kopec 2022-08-08 11:46:18 UTC
Created attachment 1904264 [details]
journalctl

Description of problem:
After no screen boot wayland session is not available. Only X, when logged into X  and logged out wayland is available.

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. reboot machine
2. disconnect displays during shutdown
3. wait for boot to finish
4. connect a display



Additional info:
Tested on AMD Raven Ridge and RTX 5000 [TU104]
Navi 24 and GA106 failed show any image.

Comment 1 Ray Strode [halfline] 2022-08-08 14:20:17 UTC
Journal shows

Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found

So I guess mutter fails if it can't find a video card (GPU) with a monitor (output) attached.

Comment 2 Peter Kopec 2022-11-30 13:51:09 UTC
Created attachment 1928743 [details]
rhel9.2 journal

still reproducible on rhel 9.2. Attaching new journal conf.

Comment 3 Peter Kopec 2022-11-30 13:51:56 UTC
Created attachment 1928744 [details]
rhel 8.8 journal

Adding also log from RHEL 8.8 with no issue.

Comment 4 Jonas Ådahl 2023-01-31 14:36:23 UTC
> Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found

This means there are no GPUs with connectors at all, not that there are no monitors connected. Problem is probably that we start before drm/kms has fully loaded. There seems to be a device (nouveau at /dev/dri/card0) but it reports itself as completely headless.

Comment 5 Ray Strode [halfline] 2023-01-31 15:01:12 UTC
Ah, indeed, and I missed originally that comment 0 says booting into X and back out makes wayland start working, so a monitor must be attached!

But we don't start gnome-shell until systemd says the system has CanGraphical.

That gets set when udev has master-of-seat tagged on the card device from this udev rule:

SUBSYSTEM=="drm", KERNEL=="card[0-9]*", TAG+="seat", TAG+="master-of-seat"

So basically the presence of the card node is how we know we're good to start.

Probably the nouveau driver needs to do an initial coldplug of connectors before setting up /dev/dri/card0

or I guess systemd could change its udev rule to only give master-of-seat if the system has

/dev/dri/card[0-9][^-]* and /dev/dri/card[0-9]*

Let's move to graphics team and get their take.

Comment 6 Ray Strode [halfline] 2023-01-31 15:10:02 UTC
Hmm, the code has this comment, https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_connector.c

/**
 * drm_connector_register - register a connector
 * @connector: the connector to register
 *
 * Register userspace interfaces for a connector. Only call this for connectors
 * which can be hotplugged after drm_dev_register() has been called already,
 * e.g. DP MST connectors. All other connectors will be registered automatically
 * when calling drm_dev_register().

So i'm assuming drm_dev_register creates /dev/dri/card0 and drm_connector_register creates /dev/dri/card0-WHATEVERCONNECTOR

The comment says drm_dev_register will create the connectors it can up front. So the kernel may be doing all it can already?

Are there GPUs that only have hotpluggable connectors? (maybe cuda things?) is that what we're seeing here?

Hopefully graphics team has insight.

Comment 7 Ray Strode [halfline] 2023-01-31 16:08:51 UTC
I just took a quick look over the log to see if anything popped out:

╎❯ cat journal_noscreenboot | grep -E 'Load Kernel Module drm|modesetting|dri.card0|fb0|GPU|crtc|utput|EDID' | sed 's/^/> /'
> Aug 08 13:11:52 testpcR9 systemd[1]: Starting Load Kernel Module drm...
> Aug 08 13:11:52 testpcR9 systemd[1]: Finished Load Kernel Module drm.
> Aug 08 13:11:52 testpcR9 kernel: [drm] amdgpu kernel modesetting enabled.
> Aug 08 13:11:52 testpcR9 kernel: [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1043:0x876B 0xC6).

amdgpu gets loaded.


> Aug 08 13:11:53 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes

The card can't find a controller? That's weird. 


> Aug 08 13:11:53 testpcR9 systemd[1]: Starting Load Kernel Module drm...
> Aug 08 13:11:53 testpcR9 systemd[1]: Finished Load Kernel Module drm.

modprobed again.


> Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Adding device '/dev/dri/card0' (amdgpu) using atomic mode setting.

4 seconds later, systemd thinks the system is good to go and GDM has started a native gnome-shell 


> Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found

But it fails. Maybe because there's no connectors, maybe because of the missing crtc error above?


> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (==) Automatically adding GPU devices
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (==) Automatically binding GPU devices
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) xfree86: Adding drm device (/dev/dri/card0)

So we fall back to Xorg


> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) systemd-logind: got fd for /dev/dri/card0 226:0 fd 12 paused 0
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (==) Matched modesetting as autoconfigured driver 1
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) LoadModule: "modesetting"
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) Module modesetting: vendor="X.Org Foundation"
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modesetting: Driver for Modesetting Kernel Drivers: kms
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): using drv /dev/dri/card0
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output DP-1 has no monitor section
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output HDMI-1 has no monitor section
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID for output DP-1
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID for output HDMI-1
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output DP-1 disconnected
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output HDMI-1 disconnected
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (WW) modeset(0): No outputs definitely connected, trying again...
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output DP-1 disconnected
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Output HDMI-1 disconnected
> Aug 08 13:11:57 testpcR9 /usr/libexec/gdm-x-session[1567]: (WW) modeset(0): Unable to find connected outputs - setting 1024x768 initial framebuffer

It sees connectors (now?) but no monitors attached, so it pretends like there's a least common denominator monitor.


> Aug 08 13:14:34 testpcR9 kernel: fbcon: amdgpudrmfb (fb0) is primary device
> Aug 08 13:14:34 testpcR9 kernel: Console: switching to colour frame buffer device 480x135
> Aug 08 13:14:34 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] fb0: amdgpudrmfb frame buffer device

Uh, fbcon is just now registered 3 minutes later?

> Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID vendor "BNQ", prod id 32816
> Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Using EDID range info for horizontal sync
> Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Using EDID range info for vertical refresh
> Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): Allocate new frame buffer 3840x2160 stride
> Aug 08 13:14:34 testpcR9 /usr/libexec/gdm-x-session[1567]: (II) modeset(0): EDID vendor "BNQ", prod id 32816

And now the monitor shows up.

So it's taking a really long time for things to initialize. Even if the kernel and/or systemd were changed to defer CanGraphical until things were good, GDM would time out and fall back to X.

I think maybe the root problem is the "Cannot find any crtc or sizes" at the top but not sure.

Again, let's wait to see what kernel graphics team says.

Comment 8 Michel Dänzer 2023-02-01 08:31:37 UTC
(In reply to Ray Strode [halfline] from comment #7)
> 
> > Aug 08 13:11:53 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes
> 
> The card can't find a controller? That's weird. 

I think this just means that no connectors were detected as connected.

Anyway, this is about fbcon, it shouldn't directly affect mutter.


> > Aug 08 13:11:57 testpcR9 gnome-shell[1495]: Failed to create backend: No GPUs with outputs found
> 
> But it fails. Maybe because there's no connectors, maybe because of the
> missing crtc error above?

Looking at the mutter code, this message is printed (and mutter bails) even if there are GPUs with connectors, if none of them are detected as connected. Which is consistent with reproduction steps 2 & 3.


> > Aug 08 13:14:34 testpcR9 kernel: fbcon: amdgpudrmfb (fb0) is primary device
> > Aug 08 13:14:34 testpcR9 kernel: Console: switching to colour frame buffer device 480x135
> > Aug 08 13:14:34 testpcR9 kernel: amdgpu 0000:08:00.0: [drm] fb0: amdgpudrmfb frame buffer device
> 
> Uh, fbcon is just now registered 3 minutes later?

This is presumably when a monitor is connected (reproduction step 4).

Comment 9 Jonas Ådahl 2023-02-01 08:58:04 UTC
> Looking at the mutter code, this message is printed (and mutter bails) even if there are GPUs with connectors, if none of them are detected as connected. Which is consistent with reproduction steps 2 & 3.

Indeed, this seems to have changed in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/525, which arrived with rhel9, without a motivation, so looks like a mistake - mutter should handle launching without any connected monitors.

Comment 10 Jonas Ådahl 2023-02-01 11:29:12 UTC
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2821 changes back to the old behavior and makes the test case actually test the mode setting/native backend paths.

Comment 11 Jonas Ådahl 2023-02-01 15:45:27 UTC
Created a scratch build: https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1876905

Mind testing before this is pushed to c9s? I haven't tested on actual hardware.

Comment 12 Peter Kopec 2023-02-01 16:54:17 UTC
I have tried to install scratch mutter-40.9-14.el9.x86_64, I dont see "Failed to create backend" in the journalctl but wayland is still not available right after connecting the monitor.

Comment 13 Jonas Ådahl 2023-02-01 16:59:48 UTC
(In reply to Peter Kopec from comment #12)
> I have tried to install scratch mutter-40.9-14.el9.x86_64, I dont see
> "Failed to create backend" in the journalctl but wayland is still not
> available right after connecting the monitor.

Can you run with `MUTTER_DEBUG=kms` in /etc/environment and attach a journal?

Either way, I'll try to imitate a similar hardware setup by masking the laptop screen.

Comment 14 Peter Kopec 2023-02-01 17:13:37 UTC
Created attachment 1941558 [details]
journal scratch debug

attaching the journal log with debug option

Comment 15 Ray Strode [halfline] 2023-02-01 18:04:17 UTC
(In reply to Michel Dänzer from comment #8)
> Looking at the mutter code, this message is printed (and mutter bails) even
> if there are GPUs with connectors, if none of them are detected as
> connected. Which is consistent with reproduction steps 2 & 3.

Ah okay, thanks for getting to the bottom of this.

Comment 16 Peter Kopec 2023-02-02 14:41:47 UTC
I have noticed that with both, current and scratch version i can have wayland available but only when i time connecting the monitor shortly after gdm shows up, booting animation is already done.

Comment 17 Jonas Ådahl 2023-02-06 18:15:01 UTC
(In reply to Peter Kopec from comment #14)
> Created attachment 1941558 [details]
> journal scratch debug
> 
> attaching the journal log with debug option

Can you try these two instead:

https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904127
https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904164

Comment 18 Peter Kopec 2023-02-06 19:05:40 UTC
(In reply to Jonas Ådahl from comment #17)

> Can you try these two instead:
> 
> https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904127
> https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1904164

With these packages wayland is available after connecting the monitor.

Comment 20 Peter Kopec 2023-02-15 14:36:06 UTC
with gnome-shell-40.10-10.el9 and mutter-40.9-14.el9 wayland is available when connecting monitor after system boots into gdm
tested with navi 33 and TU116

Comment 24 errata-xmlrpc 2023-05-09 07:44:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (mutter bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2299


Note You need to log in before you can comment on or make changes to this bug.