Bug 1575194

Summary: After F27 upgrade to F28 GDM displays nothing but mouse cursor, keyboard+mouse lockup
Product: [Fedora] Fedora Reporter: Alex Villacís Lasso <alexvillacislasso>
Component: gnome-shellAssignee: Ray Strode [halfline] <rstrode>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 28CC: a9016009, alexl, alex.ploumistos, andrew, andy, chuckr, fmuellner, gerben, jadahl, johnh, john.j5live, mclasen, otaylor, ppywlkiqletw, rhughes, robert, rstrode
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-20 13:22:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journalctl -b with gdm hang
none
Failed attempt to work around issue using permissive mode
none
Successful run of GDM after disabling Wayland in /etc/gdm/custom.conf
none
renderer/native: Fallback to non-planar API if gbm_bo_get_handle_for_plane fails alexvillacislasso: review+

Description Alex Villacís Lasso 2018-05-05 03:42:32 UTC
Created attachment 1431705 [details]
journalctl -b with gdm hang

Description of problem:
On at least 2 different systems (one 32-bit, one 64-bit), an upgrade from Fedora 27 to Fedora 28 resulted in a completely broken GDM login. When booting, the Fedora filling logo fills up as usual, but then, when handing off into GDM, only the mouse cursor appears on top of the Fedora logo. The mouse cursor is frozen - moving the mouse cursor achieves nothing. The display fails clear into the GDM background, or display the user list, or anything else. The keyboard is also unresponsive - Ctrl-Alt-F1 through F12 have no effect, and it is therefore impossible to switch locally into a terminal. As the screen shows only the mouse cursor and no other widgets, I cannot tell whether the keyboard is completely unresponsive too, but apparently the CapsLock toggle works. However, the machine is not locked up - I can successfully ssh from another machine in the network and run "init 3" from the remote shell. This successfully reverts into a text-only login.

The only workaround I have found so far is to use a different login manager. Both LightDM and LXDM work normally.

Version-Release number of selected component (if applicable):
kernel-PAE-core-4.16.5-200.fc27.i686
mesa-dri-drivers-18.0.2-1.fc28.i686
gdm-3.28.1-1.fc28.i686
selinux-policy-targeted-3.14.1-24.fc28.noarch
(Note - I am reporting this from the 32-bit machine, but the issue is NOT 32-bit-only, since my work machine is 64-bit and suffers the same issue.)

How reproducible:
Always


Steps to Reproduce:
1. Working machine with FC27 and GDM login
2. Upgrade to FC28
3. Attempt to boot to GDM login

Actual results:
Nothing but mouse cursor displayed on top of Fedora boot sequence logo

Expected results:
Standard GDM greeter should appear with responsive mouse and keyboard.

Additional info:

Note the bunch of messages in the attached file:
may 04 21:44:46 karlalex-acer.palosanto.com systemd[662]: selinux: avc:  denied  { status } for auid=n/a uid=42 gid=42 cmdline="/usr/libexec/gdm-wayland-session gnome-session --autostart /usr/share/gdm/greeter/autostart" scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=system permissive=0
may 04 21:44:47 karlalex-acer.palosanto.com systemd[662]: selinux: avc:  denied  { reload } for auid=n/a uid=42 gid=42 cmdline="/usr/libexec/gnome-session-binary --autostart /usr/share/gdm/greeter/autostart" scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=system permissive=0
may 04 21:44:47 karlalex-acer.palosanto.com systemd[662]: selinux: avc:  denied  { reload } for auid=n/a uid=42 gid=42 cmdline="/usr/libexec/gnome-session-binary --autostart /usr/share/gdm/greeter/autostart" scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=system permissive=0
may 04 21:44:47 karlalex-acer.palosanto.com systemd[662]: selinux: avc:  denied  { reload } for auid=n/a uid=42 gid=42 cmdline="/usr/libexec/gnome-session-binary --autostart /usr/share/gdm/greeter/autostart" scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=system permissive=0

However, running "setenforce 0" as root does nothing to fix the issue.

Comment 1 Alex Villacís Lasso 2018-05-05 16:40:39 UTC
This bug on selinux-policy looks like it is related to my case: #1559531

Comment 2 Alex Villacís Lasso 2018-05-05 16:43:49 UTC
Sorry, URL is https://bugzilla.redhat.com/show_bug.cgi?id=1559531

I will try booting in permissive mode to see whether it does anything.

Comment 3 Alex Villacís Lasso 2018-05-05 17:26:59 UTC
Permissive mode did *NOT* work around this issue. This is consistent with "setenforce 0" not working either.

Comment 4 Alex Villacís Lasso 2018-05-05 17:28:40 UTC
Created attachment 1432009 [details]
Failed attempt to work around issue using permissive mode

This was the attempt to use permissive mode (did not work) and resulting journalctl output. NOTE: this is on the 64-bit machine. The previous file was for the 32-bit machine experiencing the same issue.

Comment 5 Alex Villacís Lasso 2018-05-05 23:58:06 UTC
Created attachment 1432162 [details]
Successful run of GDM after disabling Wayland in /etc/gdm/custom.conf

This is for the 32-bit machine.

This issue is almost certainly a wayland regression with either GDM or gnome-shell. If I disable wayland in /etc/gdm/custom.conf then GDM starts correctly by falling back to X11. Only I can only run X11-based sessions, so I cannot check whether a gnome-shell session as an user desktop works correctly.

Weston runs correctly on the same machine.

Comment 6 Alex Villacís Lasso 2018-05-08 16:15:12 UTC
I believe now this is a regression in gnome-shell or one of its components.

I enabled GDM autologin with Wayland disabled, checked that it worked, then re-enabled Wayland again. As a result I see that, according to a process listing through a remote shell, the user desktop boots to completion, with all required processes running. However, the display simply stops updating. So something in gnome-shell broke Wayland support with my machines.

This is lspci with one of the affected machines:

00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 29c2
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fea80000 (32-bit, non-prefetchable) [size=512K]
	Region 1: I/O ports at cc00 [size=8]
	Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M]
	Region 3: Memory at fe900000 (32-bit, non-prefetchable) [size=1M]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: i915
	Kernel modules: i915

Comment 7 Alex Villacís Lasso 2018-05-09 01:46:05 UTC
This is lspci from another affected machine (Acer Aspire One ZG5)

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GSE Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller])
	Subsystem: Acer Incorporated [ALI] Device 015b
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at 58480000 (32-bit, non-prefetchable) [size=512K]
	Region 1: I/O ports at 60c0 [size=8]
	Region 2: Memory at 40000000 (32-bit, prefetchable) [size=256M]
	Region 3: Memory at 58500000 (32-bit, non-prefetchable) [size=256K]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: i915
	Kernel modules: i915

Comment 8 Alexander Ploumistos 2018-05-09 18:01:38 UTC
I am seeing this on an Aspire One 150. I hadn't tried to run GNOME for a long time, but I had left gdm as the default login manager. I had disabled network autoconfiguration, so I could not connect to it from another system, I had to resort to the "systemd.unit=multi-user.target" boot argument.

Off the top of my head I can't remember where I had read this, but I do not think Wayland was supposed to work with this graphics chipset. When I start lightdm, I get two options labeled "GNOME", as well as "GNOME on X" and "Gnome Classic". Neither resulted in a Wayland session, they were all X11.

I think it's the same graphics subsystem as yours:

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GSE Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller])
	Subsystem: Acer Incorporated [ALI] Device 015b
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at 78480000 (32-bit, non-prefetchable) [size=512K]
	Region 1: I/O ports at 60c0 [size=8]
	Region 2: Memory at 60000000 (32-bit, prefetchable) [size=256M]
	Region 3: Memory at 78500000 (32-bit, non-prefetchable) [size=256K]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915

00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
	Subsystem: Acer Incorporated [ALI] Device 015b
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Region 0: Memory at 78400000 (32-bit, non-prefetchable) [size=512K]
	Capabilities: <access denied>


Anyway, this can be circumvented, but it is a nasty surprise. What's worse is that it's impossible to switch to a VT to repair things and killing the power -the only way out- could lead to data corruption.

Comment 9 Jonas Ådahl 2018-05-09 19:58:36 UTC
Is your mesa up to date? Looking at the log, I see the following entry:

may 04 21:44:55 karlalex-acer.palosanto.com org.gnome.Shell.desktop[745]: Failed to initialize glamor, falling back to sw

This means that Xwayland couldn't initialize glamor, and that would happen if Wayland EGL was not available.

What is your version of mutter and mesa?

Comment 10 Alexander Ploumistos 2018-05-09 20:36:02 UTC
I had the same message:

May 08 22:54:09 org.gnome.Shell.desktop[963]: glamor: EGL version 1.4 (DRI2):
May 08 22:54:09 org.gnome.Shell.desktop[963]: Failed to initialize glamor
May 08 22:54:09 org.gnome.Shell.desktop[963]: Failed to initialize glamor, falling back to sw


I keep updates-testing enabled and I have these:

mesa-dri-drivers-18.0.2-1.fc28.i686
mesa-filesystem-18.0.2-1.fc28.i686
mesa-libEGL-18.0.2-1.fc28.i686
mesa-libgbm-18.0.2-1.fc28.i686
mesa-libGL-18.0.2-1.fc28.i686
mesa-libglapi-18.0.2-1.fc28.i686
mesa-libGLES-18.0.2-1.fc28.i686
mesa-libGLU-9.0.0-14.fc28.i686
mesa-libOpenCL-18.0.2-1.fc28.i686
mesa-libOSMesa-18.0.2-1.fc28.i686
mesa-libxatracker-18.0.2-1.fc28.i686
mesa-vulkan-drivers-18.0.2-1.fc28.i686
mutter-3.28.1-1.fc28.i686

Comment 11 Jonas Ådahl 2018-05-09 20:50:19 UTC
Well, then you should have the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1564210 . The issue still seems to be that both Wayland EGL seems broken (as Xwayland couldn't initiate glamor), and KMS as well, given the error message:

may 04 21:45:27 karlalex-acer.palosanto.com gnome-shell[745]: Failed to create new back buffer handle: No such file or directory.

What kind of hardware is this? Plain intel laptop, or one of those with an Nvidia GPU connected to HDMI?

Comment 12 Alexander Ploumistos 2018-05-09 21:09:26 UTC
(In reply to Jonas Ådahl from comment #11)
> What kind of hardware is this? Plain intel laptop, or one of those with an
> Nvidia GPU connected to HDMI?

It's this thing:
https://www.cnet.com/products/acer-aspire-one-a150/specs/

Actually, when I was grepping for the glamor strings I missed the second line here:

May 08 22:54:09 org.gnome.Shell.desktop[963]: glamor: EGL version 1.4 (DRI2):
May 08 22:54:09 org.gnome.Shell.desktop[963]: Require OpenGL version 2.1 or later.
May 08 22:54:09 org.gnome.Shell.desktop[963]: Failed to initialize glamor
May 08 22:54:09 org.gnome.Shell.desktop[963]: Failed to initialize glamor, falling back to sw

I think the 945GSE supports OpenGL 1.4, so maybe that is the root cause.

Comment 13 Alex Villacís Lasso 2018-05-09 21:42:41 UTC
(In reply to Jonas Ådahl from comment #9)
> Is your mesa up to date? Looking at the log, I see the following entry:
> 
> may 04 21:44:55 karlalex-acer.palosanto.com org.gnome.Shell.desktop[745]:
> Failed to initialize glamor, falling back to sw
> 
> This means that Xwayland couldn't initialize glamor, and that would happen
> if Wayland EGL was not available.
> 
> What is your version of mutter and mesa?

On my 64-bit machine (pure intel chipset, no nvidia, no hdmi anything):
mesa-vulkan-drivers-18.0.2-1.fc28.x86_64
mesa-libEGL-18.0.2-1.fc28.x86_64
mesa-libEGL-devel-18.0.2-1.fc28.i686
mesa-libglapi-18.0.2-1.fc28.i686
mesa-dri-drivers-18.0.2-1.fc28.x86_64
mesa-libGLU-9.0.0-14.fc28.x86_64
mesa-libGLU-9.0.0-14.fc28.i686
mesa-libGL-devel-18.0.2-1.fc28.x86_64
mesa-libEGL-devel-18.0.2-1.fc28.x86_64
mesa-libEGL-18.0.2-1.fc28.i686
mesa-dri-drivers-18.0.2-1.fc28.i686
mesa-libOSMesa-18.0.2-1.fc28.x86_64
mesa-libgbm-18.0.2-1.fc28.i686
mesa-filesystem-18.0.2-1.fc28.x86_64
mesa-libgbm-18.0.2-1.fc28.x86_64
mesa-libxatracker-18.0.2-1.fc28.x86_64
mesa-vdpau-drivers-18.0.2-1.fc28.x86_64
mesa-libGLES-18.0.2-1.fc28.x86_64
mesa-libOSMesa-18.0.2-1.fc28.i686
mesa-libglapi-18.0.2-1.fc28.x86_64
mesa-filesystem-18.0.2-1.fc28.i686
mesa-libOpenCL-18.0.2-1.fc28.x86_64
mesa-libGL-18.0.2-1.fc28.x86_64
mesa-libGL-18.0.2-1.fc28.i686
mutter-3.28.1-1.fc28.x86_64


GLAMOR has never been available on any of my machines:

Mon 2017-06-05 10:00:38 -05 avillacis.palosanto.com gnome-shell[1513]: Failed to apply DRM plane transform 0: Invalid argument
Mon 2017-06-05 10:00:38 -05 avillacis.palosanto.com org.gnome.Shell.desktop[1513]: glamor: EGL version 1.4 (DRI2):
Mon 2017-06-05 10:00:38 -05 avillacis.palosanto.com org.gnome.Shell.desktop[1513]: Failed to initialize glamor
Mon 2017-06-05 10:00:38 -05 avillacis.palosanto.com org.gnome.Shell.desktop[1513]: Failed to initialize glamor, falling back to sw

This did not prevent gnome-shell from working on wayland up to Fedora 27 (GNOME 3.26.x series).

Comment 14 Alex Villacís Lasso 2018-05-09 21:45:03 UTC
...however, the "Failed to create..." string is new, and it first appears just after rebooting into the newly-upgraded Fedora 28:

Thu 2018-05-03 21:43:44 -05 avillacis.palosanto.com realmd[2014]: connected to bus
Thu 2018-05-03 21:43:45 -05 avillacis.palosanto.com gnome-shell[1052]: JS WARNING: [resource:///org/gnome/shell/ui/windowManager.js 1468]: reference to undefined property "MetaW>
Thu 2018-05-03 21:43:45 -05 avillacis.palosanto.com gsd-smartcard[1964]: Got potentially spurious smartcard event error: ffffe0a7.
Thu 2018-05-03 21:43:45 -05 avillacis.palosanto.com gnome-shell[1052]: Failed to create new back buffer handle: No such file or directory
Thu 2018-05-03 21:43:45 -05 avillacis.palosanto.com realmd[2014]: released daemon: startup
Thu 2018-05-03 21:43:45 -05 avillacis.palosanto.com dbus-daemon[700]: [system] Successfully activated service 'org.freedesktop.realmd'

Comment 15 Alex Villacís Lasso 2018-05-10 16:19:39 UTC
I should add that GLAMOR has been deliberately disabled for machines like mine, because otherwise software fallbacks make rendering unbearably slow under Wayland: https://bugzilla.redhat.com/show_bug.cgi?id=1173801

Comment 16 Gerben Welter 2018-05-10 19:13:47 UTC
I just upgraded a desktop machine from F27 to F28 which uses an Intel GPU and I experience the exact same behavior. Total lockup when the Fedora logo should transition to GDM. The machine is reachable though ssh. The Intel GPU is the following:

00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
        Subsystem: Fujitsu Technology Solutions Device 10fc
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at f0200000 (32-bit, non-prefetchable) [size=512K]
        Region 1: I/O ports at 18d0 [size=8]
        Region 2: Memory at e0000000 (32-bit, prefetchable) [size=256M]
        Region 3: Memory at f0100000 (32-bit, non-prefetchable) [size=1M]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: i915

But I also got the glamor messages prior to upgrading from F27 and it worked perfectly. I do notice that 'accountsservice' keeps crashing like mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1573550. The proposed update only makes accountsservice crash in another way.

So I'm not sure this lockup up is related to graphics side of Gnome.

Comment 17 Alex Villacís Lasso 2018-05-10 19:24:50 UTC
(In reply to Gerben Welter from comment #16)
> I just upgraded a desktop machine from F27 to F28 which uses an Intel GPU
> and I experience the exact same behavior. Total lockup when the Fedora logo
> should transition to GDM. The machine is reachable though ssh. The Intel GPU
> is the following:
> 
> 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express
> Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
>         Subsystem: Fujitsu Technology Solutions Device 10fc
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 16
>         Region 0: Memory at f0200000 (32-bit, non-prefetchable) [size=512K]
>         Region 1: I/O ports at 18d0 [size=8]
>         Region 2: Memory at e0000000 (32-bit, prefetchable) [size=256M]
>         Region 3: Memory at f0100000 (32-bit, non-prefetchable) [size=1M]
>         [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
>         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
>                 Address: 00000000  Data: 0000
>         Capabilities: [d0] Power Management version 2
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Kernel driver in use: i915
> 
> But I also got the glamor messages prior to upgrading from F27 and it worked
> perfectly. I do notice that 'accountsservice' keeps crashing like mentioned
> in https://bugzilla.redhat.com/show_bug.cgi?id=1573550. The proposed update
> only makes accountsservice crash in another way.
> 
> So I'm not sure this lockup up is related to graphics side of Gnome.

If disabling Wayland in /etc/gdm/custom.conf allows to reach GDM log in correctly, it is this bug.

Comment 18 Gerben Welter 2018-05-10 20:33:14 UTC
(In reply to Alex Villacís Lasso from comment #17)

> > So I'm not sure this lockup up is related to graphics side of Gnome.
> 
> If disabling Wayland in /etc/gdm/custom.conf allows to reach GDM log in
> correctly, it is this bug.

Ah, yes. Disabling Wayland enables login again. I'm typing this on the affected computer.

Comment 19 Alex Villacís Lasso 2018-05-14 00:58:25 UTC
Still happening after gnome-shell update:
gnome-shell-3.28.2-1.fc28.i686
mutter-3.28.2-1.fc28.i686

Comment 20 Alex Villacís Lasso 2018-05-14 01:01:51 UTC
BTW, the "Failed to create new back buffer handle" message that is popping in the logs is actually from mutter code, at least from https://github.com/GNOME/mutter/blob/master/src/backends/native/meta-renderer-native.c .

Comment 21 Charles Henry Rothauser 2018-05-14 19:40:24 UTC
I have the exact same problem on my dell optiplex 755 with on-board video controller.  That is, the upgrade from Fedora 27 to 28 completes, then the boot hangs with no gdm login.  I am able to boot to run level 3 and run journalctl

Comment 22 Villy Kruse 2018-05-30 18:15:34 UTC
A few more observations.

The liveCD also is stuck before showing the log-in prompt

As far as I remember, I tried to run gnome wayland on fc28-beta in April and it did work.  Only around official release time it stopped working after a lot of updates was installed.

Hardware is plain desktop with Intel GPU 2 Gig memory and 64bit os.  No fancy hardware of any kind.

Comment 23 Alex Villacís Lasso 2018-06-04 20:24:26 UTC
I have found the bug report for this bug (https://gitlab.gnome.org/GNOME/mutter/issues/127) even though it does not have as much information as this one.

Comment 24 Alex Villacís Lasso 2018-06-04 21:24:19 UTC
I have found that (on affected machines) one quick way to replicate the lockup is to run "mutter --wayland" in the KMS console shell:


(mutter:5254): mutter-WARNING **: 16:10:15.305: failed to bind to @/tmp/.X11-unix/X0: Address already in use

(mutter:5254): mutter-WARNING **: 16:10:15.305: failed to bind to @/tmp/.X11-unix/X1: Address already in use
glamor: EGL version 1.4 (DRI2):
Require OpenGL version 2.1 or later.
Failed to initialize glamor
Failed to initialize glamor, falling back to sw

(mutter:5254): mutter-WARNING **: 16:10:15.933: Failed to create new back buffer handle: No such file or directory

Comment 25 John Heidemann 2018-06-08 18:16:17 UTC
Same problem with mid-2007 Mac Mini.  Workaround of "use Xorg" works, but this is a pretty serious regression over F27.

Comment 26 Jonas Ådahl 2018-06-08 18:47:18 UTC
Any possibility that anyone who can reproduce this can do some bisecting? For example downgrading to mesa from F27, or building mutter locally (without installing) and bisecting using git bisect.

Comment 27 Alex Villacís Lasso 2018-06-08 19:05:46 UTC
(In reply to Villy Kruse from comment #22)
> A few more observations.
> 
> The liveCD also is stuck before showing the log-in prompt
> 
> As far as I remember, I tried to run gnome wayland on fc28-beta in April and
> it did work.  Only around official release time it stopped working after a
> lot of updates was installed.
> 
> Hardware is plain desktop with Intel GPU 2 Gig memory and 64bit os.  No
> fancy hardware of any kind.

Is there somewhere we can find the Fedora 28 Beta ISO image, or (even better) the RPMS (or SRPMS) it contained. The RPMS could be then be installed on a current F28 and have a starting point for a bisection, as well as check whether the issue is in mutter alone, or in an interaction between mutter and mesa.

Comment 28 Villy Kruse 2018-06-09 08:26:18 UTC
(In reply to Alex Villacís Lasso from comment #27)
> (In reply to Villy Kruse from comment #22)
> > A few more observations.
> > 
> > The liveCD also is stuck before showing the log-in prompt
> > 
> > As far as I remember, I tried to run gnome wayland on fc28-beta in April and
> > it did work.  Only around official release time it stopped working after a
> > lot of updates was installed.
> > 
> > Hardware is plain desktop with Intel GPU 2 Gig memory and 64bit os.  No
> > fancy hardware of any kind.
> 
> Is there somewhere we can find the Fedora 28 Beta ISO image, or (even
> better) the RPMS (or SRPMS) it contained. The RPMS could be then be
> installed on a current F28 and have a starting point for a bisection, as
> well as check whether the issue is in mutter alone, or in an interaction
> between mutter and mesa.

I can get a login prompt after downgrading to mutter-3.28.0-1.fc28.x86_64.rpm

However

It turns out that we first get a signal 11 crash and then gdm is finally started in xorg mode.  So back to "we have no idea when the problem started".

By the way:  I can boot the LiveCD in qemu/kvm

Comment 29 Alex Villacís Lasso 2018-06-20 16:53:25 UTC
No change (in the 32-bit system) after upgrading to mesa-*-18.0.5 .

However, if I enable wayland and then move or rename /usr/lib[64]/dri/i915_dri.so so it is unavailable when mutter/gnome-shell/gdm starts, then the graphics display falls back into a software-rasterizer mode that is sluggish to update the screen, but it works.

Comment 30 Alex Villacís Lasso 2018-06-24 04:47:14 UTC
At long last, with help from commenters on the upstream bug report (https://gitlab.gnome.org/GNOME/mutter/issues/127), I have been able to bisect this issue.

c0d9b08ef9bf2be865aad9bf1bc74ba24c655d9f is the first bad commit
commit c0d9b08ef9bf2be865aad9bf1bc74ba24c655d9f
Author: Daniel Stone <daniels>
Date:   Thu Aug 3 15:06:08 2017 +0100

    renderer/native: Use modifier-aware GBM API
    
    Newer versions of GBM support buffer modifiers, including multi-plane
    buffers. Use this new API to explicitly pull the information from GBM,
    and feed it to drmModeAddFB2WithModifiers.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=785779

:100644 100644 42348146c5d3469a6fb7455ed3967275334aa27a 7399f741aa9bb407b66d079794107688ba8e1e35 M	configure.ac
:040000 040000 882e8a59c45d8f878876009e6b281aee36f1dace 325e638e1038c23ac0a308277b38ea3cb07c30a5 M	src

This is the git bisect log:
git bisect start
# bad: [41303bc01be873e684f11a3407aa556af2922426] Bump version to 3.28.2
git bisect bad 41303bc01be873e684f11a3407aa556af2922426
# good: [dbd2827ca174c5aff321361c0900c41587bbf1a6] Bump version to 3.27.1
git bisect good dbd2827ca174c5aff321361c0900c41587bbf1a6
# bad: [f8f1bcfa9e6a02c621a194182dd7f8c3abe4331c] backends: Add support for Wacom stylus tertiary-button-action
git bisect bad f8f1bcfa9e6a02c621a194182dd7f8c3abe4331c
# good: [5d3b4f0134bdd84ee30648fb401bad7522d23cf1] wayland/xdg-shell: Fix top-most check when grabbing
git bisect good 5d3b4f0134bdd84ee30648fb401bad7522d23cf1
# good: [bd9a3008014d1063aac75241ff750798cbe5aaee] window: Defer stack placement without a buffer
git bisect good bd9a3008014d1063aac75241ff750798cbe5aaee
# good: [513c278077ce6885de883cf9a677337f16a504be] clutter: Make ClutterText request toggling the input panel
git bisect good 513c278077ce6885de883cf9a677337f16a504be
# bad: [dc37ee27824b7c28f377286b3f9ebba1c56283d8] data: Don't expose horizontal workspace keybindings to Settings
git bisect bad dc37ee27824b7c28f377286b3f9ebba1c56283d8
# good: [d670a1aa78ff67358b483e411528df6ec466727b] crtc/kms: Add parsing for IN_FORMATS property
git bisect good d670a1aa78ff67358b483e411528df6ec466727b
# bad: [c0d9b08ef9bf2be865aad9bf1bc74ba24c655d9f] renderer/native: Use modifier-aware GBM API
git bisect bad c0d9b08ef9bf2be865aad9bf1bc74ba24c655d9f
# good: [d99cd279d2b5434c51e7a45fd11b46c6c83e7843] renderer/native: Use drmModeAddFB2 where available
git bisect good d99cd279d2b5434c51e7a45fd11b46c6c83e7843
# first bad commit: [c0d9b08ef9bf2be865aad9bf1bc74ba24c655d9f] renderer/native: Use modifier-aware GBM API

A word of warning: any checkouts including commit cc4e0071489e739597a64ea2549ee9ab75060531 (renderer/native: Create GBM surfaces with modifiers) crash on me unless I apply commit 1851fa2bd0ca7330079a99b8019920e0a15e842a (renderer-native: Fall back to non-modifier GBM surfaces) on top of it.

Comment 31 Andrew Haveland-Robinson 2018-06-27 00:55:47 UTC
I just upgraded an HP530 i915 laptop from fc27 to fc28 and had the exact same problem and a long search led me here.

I confirm that uncommenting WaylandEnable=false in /etc/gdm/custom.conf allows a login screen, which is better than a blank screen!

Waiting patiently for the fix to appear in the repos...

Comment 32 Alex Villacís Lasso 2018-07-04 20:06:25 UTC
Created attachment 1456611 [details]
renderer/native: Fallback to non-planar API if gbm_bo_get_handle_for_plane fails

I have found the root cause: the multiplanar GBM API introduced by mutter commit c0d9b08ef9bf2be865aad9bf1bc74ba24c655d9f fails on i915 (and maybe other drivers). Due to missing error checks, subsequent code is called with an invalid handle. I have compiled a private RPM that includes the attached patch, and it indeed solves the wayland hang on both GDM and gnome-shell. Please review for inclusion in the next mutter RPM.

Comment 33 Alex Villacís Lasso 2018-08-10 00:26:49 UTC
A note on progress: the final form of the above fix has been merged into mutter master: https://gitlab.gnome.org/GNOME/mutter/merge_requests/160

Comment 34 Alexander Ploumistos 2018-08-10 00:42:32 UTC
Good job Alex!
Once this gets backported, we might need to spin a new F28 compose.

Comment 35 Andre Klapper 2018-08-16 10:24:42 UTC
As upstream provided no .91 mutter release at ftp://ftp.gnome.org/pub/gnome/sources/mutter/3.29/ does anyone plan to create a scratch build at https://koji.fedoraproject.org/koji/packageinfo?packageID=8870 to give this more testing? (Asking as I have no idea how to do that.)

Comment 36 Alex Villacís Lasso 2018-08-17 21:45:35 UTC
The final form of the patch has been backported to the gnome-3-28 branch of mutter as commit 1276cc97d1e6437c7fbc43fdd5cbcea39f60acee . Not yet tagged as part of an official point release.

Comment 37 Jonas Ådahl 2018-08-18 10:05:07 UTC
There is likely coming a new release on the 3.28 branch soon with some other fixes as well.

Comment 38 Alex Villacís Lasso 2018-10-14 14:04:37 UTC
Please release a new mutter RPM for Fedora 28 that contains this bugfix (1276cc97d1e6437c7fbc43fdd5cbcea39f60acee in gnome-3-28 branch). There have been no commits to the gnome-3-28 branch since 8ddbe9d98bb02145fea898a2a85bbb49f2e85f5b, and no tagged point relase for mutter-3.28 either.

Comment 39 Alex Villacís Lasso 2018-10-16 16:54:33 UTC
The package mutter-3.28.3-4.fc28 has just been released to the stable F28 repositories. I can confirm that it fixes the lockup problem in at least the Acer Aspire One ZG5 (the 32-bit machine I have access to). I still have to test this fix on my 64-bit machine at home.

Comment 40 Alex Villacís Lasso 2018-10-20 13:22:39 UTC
My home machine (a 64-bit one) no longer locks up after the update to mutter-3.28.3-4. I consider this particular bug to be fixed for me.