Bug 1568627

Summary: GNOME crashes on boot after install of FAW in Fedora-28-20180417.n.0 : "Missing extension for GBM renderer: EGL_KHR_platform_gbm"
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: gnome-shellAssignee: Owen Taylor <otaylor>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 28CC: dustymabe, fmuellner, jlebon, jsmith.fedora, kevin, klember, mboddu, mclasen, miabbott, otaylor, ppisar, walters
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: AcceptedFreezeException
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-19 22:08:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469207    
Attachments:
Description Flags
tarball containing whole of /var/log after test fails none

Description Adam Williamson 2018-04-17 23:49:20 UTC
Starting with Fedora-28-20180417.n.0 (this passed with Fedora-28-20180416.n.0), after installing FAW from the DVD installer image, boot of the installed system fails: instead of g-i-s or GDM showing up, the "Oh no!" error screen shows up -

https://openqa.fedoraproject.org/tests/224864#step/_graphical_wait_login/8

Will attach the /var/log tarball here. Looking at the logs, this error is the immediate cause of the failure:

Apr 17 14:22:42 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com gnome-shell[1064]: Failed to create backend: Failed to initialize renderer: Missing extension for GBM renderer: EGL_KHR_platform_gbm
Apr 17 14:22:42 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com gnome-session[1048]: gnome-session-binary[1048]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Apr 17 14:22:42 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com gnome-session-binary[1048]: Unrecoverable failure in required component org.gnome.Shell.desktop
Apr 17 14:22:42 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com gnome-session-binary[1048]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Apr 17 14:22:42 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com gdm-launch-environment][946]: pam_unix(gdm-launch-environment:session): session closed for user gdm

I'm not immediately sure what's causing that, though. Assigning to gnome-shell to start with, but this could well turn out to be mutter or mesa or something. Will CC FAW-interested folks.

Proposing for a freeze exception, as obviously we want the day 0 FAW image to work correctly.

Comment 1 Adam Williamson 2018-04-17 23:52:38 UTC
Created attachment 1423275 [details]
tarball containing whole of /var/log after test fails

Comment 2 Adam Williamson 2018-04-17 23:53:37 UTC
URL of an image to test with, for convenience: https://kojipkgs.fedoraproject.org/compose/branched/Fedora-28-20180417.n.0/compose/AtomicWorkstation/x86_64/iso/Fedora-AtomicWorkstation-ostree-x86_64-28-20180417.n.0.iso

note openQA tests in a qemu VM with 'std' graphics (same as 'vga', I think). Results may differ on bare metal or with a different virtual video device, I guess.

Comment 3 Adam Williamson 2018-04-17 23:55:37 UTC
<mclasen> adamw: did mesa-libEGL go missing again ?
<mclasen> there was some glitch where soft deps were interpreted differently in tree compose, or something
<mclasen> https://pagure.io/fedora-comps/pull-request/250
<mclasen> maybe that libsolv change that colin was theorizing about has hit f28 ?
<adamw> who knows!
<adamw> we can apply that comps change to f28 easily enough, though, if that should fix it...

Comment 4 Micah Abbott 2018-04-18 13:05:58 UTC
It does look like mesa-libEGL went missing:

$ rpm-ostree db diff 5344360af3807647a728366dc59ef23aa5100b4b83013316e0f1c9c9d0031103 d3461fff6aa51d6167ce57919368688cc2512f080048a3e0784e4161a208c701
ostree diff commit old: 5344360af3807647a728366dc59ef23aa5100b4b83013316e0f1c9c9d0031103
ostree diff commit new: d3461fff6aa51d6167ce57919368688cc2512f080048a3e0784e4161a208c701
...
Removed:
  authconfig-7.0.1-5.fc28.x86_64
  gnome-themes-standard-3.27.90-1.fc28.x86_64
  mesa-libEGL-18.0.0-4.fc28.x86_64
  mesa-libGL-18.0.0-4.fc28.x86_64

Comment 5 Kalev Lember 2018-04-18 13:22:09 UTC
Should mutter depend on mesa-libEGL and mesa-libGL?

Comment 6 Kalev Lember 2018-04-18 13:43:28 UTC
ajax just did a libglvnd build that adds back the mesa-libGL and mesa-libEGL hard requires.

Comment 7 Fedora Update System 2018-04-18 13:44:58 UTC
libglvnd-1.0.1-0.5.20180327git5baa1e5.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-d48d3c17a2

Comment 8 Dusty Mabe 2018-04-18 14:31:40 UTC
I just opened two PRs (one for rawhide and one for f28) to sync the package list for FAW with comps. Note the rawhide one adds in mesa-libEGL, but the f28 one doesn't. So the comps differ slightly for rawhide vs f28, which might be why we are seeing an issue?

https://pagure.io/workstation-ostree-config/pull-request/83

https://pagure.io/workstation-ostree-config/pull-request/84

Comment 9 Adam Williamson 2018-04-18 15:16:47 UTC
Dusty: that's exactly what the discussion in comment #3 was about, I think. Note that PR mclasen linked to changed only f29, not f28.

Comment 10 Dusty Mabe 2018-04-18 15:25:15 UTC
(In reply to Adam Williamson from comment #9)
> Dusty: that's exactly what the discussion in comment #3 was about, I think.
> Note that PR mclasen linked to changed only f29, not f28.

ok. if someone does decide to create that PR for f28 let me know and I can update comps again.

Comment 11 Adam Williamson 2018-04-18 21:07:12 UTC
So, I managed to join the dots on exactly *why* the change from Requires to Recommends caused mesa-libEGL to go missing from the ostree, I think. Basically, because lorax doesn't pull Recommends into install trees, and the FAW ostree is built using the Workstation install tree for its base repo. Full details here: https://bugzilla.redhat.com/show_bug.cgi?id=1569242

Comment 12 Adam Williamson 2018-04-18 23:12:47 UTC
So we kinda have three proposed fixes for this now: the libglvnd update, the clutter update in https://bugzilla.redhat.com/show_bug.cgi?id=1568881 , and the idea of making F28's comps specify mesa-libEGL explicitly like Rawhide's.

We probably don't really need all three of those. What do we think is the most actually-correct combination of the three to do? And should we synchronize whatever we decide on with Rawhide?

Comment 13 Fedora Update System 2018-04-19 08:53:47 UTC
libglvnd-1.0.1-0.5.20180327git5baa1e5.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-d48d3c17a2

Comment 14 Kalev Lember 2018-04-19 10:12:39 UTC
(In reply to Adam Williamson from comment #12)
> So we kinda have three proposed fixes for this now: the libglvnd update, the
> clutter update in https://bugzilla.redhat.com/show_bug.cgi?id=1568881 , and
> the idea of making F28's comps specify mesa-libEGL explicitly like Rawhide's.
> 
> We probably don't really need all three of those. What do we think is the
> most actually-correct combination of the three to do? And should we
> synchronize whatever we decide on with Rawhide?

I'd prefer going down the hard dep route for F28 GA. We did this for F27 and it's just reverting to a known good state. If we do the comps fix, adding mesa-libEGL to Workstation comps, it probably fixes Workstation well enough, but who knows if it fixes all other spins and distro upgrades. I'd sleep better if we have a hard dep for F28 :)

For rawhide and short term, I'd say let's keep the exact same thing as we do for F28, whichever it ends up to be -- if we go for hard dep, then revert the comps change that already went in.

For longer term for rawhide, I'd like to go back to libglvnd having Recommends on mesa-libEGL and figure out the install tree issues so that recommended deps are pulled in. I don't think this is something we should play with during F28 freeze though.

Comment 15 Adam Williamson 2018-04-19 15:25:43 UTC
There are still three permutations of "the hard dep route", though: change libglvnd, change clutter, or change both. Which of those do you prefer?

Comment 16 Kalev Lember 2018-04-19 17:03:10 UTC
I think both are needed. libglvnd update pulls in mesa-libGL, clutter update pulls in mesa-dri-drivers.

Comment 17 Adam Williamson 2018-04-19 19:46:45 UTC
OK, in that case, I'm +1 FE for both this and 1568881 . Thanks.

Comment 18 Dusty Mabe 2018-04-19 19:55:31 UTC
+1 FE

Comment 19 Kevin Fenzi 2018-04-19 19:56:11 UTC
+1 FE

Comment 20 Mohan Boddu 2018-04-19 19:57:25 UTC
+1 FE

Comment 21 Adam Williamson 2018-04-19 19:59:51 UTC
That's +4, setting Accepted.

Comment 22 Jared Smith 2018-04-19 20:01:08 UTC
+1 FE

Comment 23 Fedora Update System 2018-04-19 22:08:32 UTC
libglvnd-1.0.1-0.5.20180327git5baa1e5.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Adam Williamson 2018-04-24 15:12:42 UTC
On https://bugzilla.redhat.com/show_bug.cgi?id=1568881#c23 , Petr Pisar wrote:

"After adding mesa-libGL dependency to libglvnd, gtk clients running against Xvfb crash. Probably because of bug #1568644."

As this bug was actually the bug we treated as being 'for libglvnd', let's move that discussion over here. I'm not yet sure how big of a problem this is for F28 release, trying to look into it ATM.

Comment 25 Adam Williamson 2018-04-24 16:03:58 UTC
If I'm understanding ajax correctly, he thinks #1568644 is only really a problem if mesa-dri-drivers isn't installed. AFAICT it is installed in all the scenarios that are a problem for F28 release: it's in the installer, it's in desktop live images and so on. (In comps, it's in base-x, and all the desktop environment groups include base-x). I tested a VNC install and it worked OK. So unless we're missing something, we don't think there's a release-critical issue there. The issue with buildroots is a real one, but doesn't need to be a release blocker or FE.