Bug 1755733

Summary: Gnome Xorg session stuck immediately after login
Product: [Fedora] Fedora Reporter: Jonathan Haas <jonha87>
Component: xorg-x11-serverAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: ajax, alciregi, awilliam, bskeggs, caillon+fedoraproject, fmuellner, fzatlouk, gmarr, gnome-sig, jadahl, jglisse, john.j5live, kparal, mrmazda, ofourdan, otaylor, philip.wyett, rhughes, robatino, rstrode, sandmann, sanjay.ankur, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-01 15:27:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1644939    
Attachments:
Description Flags
Journal
none
hwinfo --short none

Description Jonathan Haas 2019-09-26 07:03:06 UTC
Description of problem:

Randomly (about 30 to 50% of the time) after booting the computer and logging in into an Xorg Gnome session, the screen stays at it's noisy gray state with the mouse cursor visible, but nothing else loads.

Version-Release number of selected component (if applicable):

[jh@jonathan-pc ~]$ rpm -q mutter
mutter-3.34.0-3.fc31.x86_64
[jh@jonathan-pc ~]$ rpm -q gnome-shell
gnome-shell-3.34.0-1.fc31.x86_64

How reproducible:
About 30% of the time

Steps to Reproduce:
1. Boot computer
2. Login to a Gnome-Xorg session
3. Wait

Actual results:

Stuck at gray screen with cursor and nothing else

Expected results:

Successful Login (Gnome UI should load)

Additional info:

After killing Xorg from a tty and trying again, the session will crash immediately each subsequent login and I'm back at the gdm screen. 

Logging into a Wayland session works without problems, even after the Xorg session crashed.

Attaching journal output from today's boot which contains boot, login, being stuck, killing Xorg from command line and I believe two subsequent Xorg login attempts (which crash).

abrt doesn't register any problem, so can't attach stacktraces.

Comment 1 Jonathan Haas 2019-09-26 07:03:56 UTC
Created attachment 1619367 [details]
Journal

Comment 2 Jonathan Haas 2019-09-26 07:18:01 UTC
Proposing as a blocker in case this is a general issue.

All elements of the default panel (or equivalent) configuration in all release-blocking desktops must function correctly in typical use.

Comment 3 Kamil Páral 2019-09-26 11:25:25 UTC
I don't see this. Have you tried to disable all gnome-shell extensions? Also please add output of "lspci -nn | grep VGA".

Comment 4 Jonathan Haas 2019-09-26 11:29:44 UTC
> I don't see this. Have you tried to disable all gnome-shell extensions?

No shell extensions are enabled.

> Also please add output of "lspci -nn | grep VGA".

[jh@jonathan-pc ~]$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06)

Comment 5 Jonathan Haas 2019-09-26 11:36:53 UTC
Also, in case this isn't obvious, I have no problems with this in Fedora 30, so I would assume that this is a regression and not just some broken pc.

Comment 6 František Zatloukal 2019-09-27 07:55:56 UTC
Hmm, I am seeing the same behavior on similar GPU when booting with nomodeset in BIOS mode, but I can use GNOME Xorg just fine if it's not with basic video driver. I'd have guessed you have nomodeset there too but you wouldn't be able to use Wayland in such case.

lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)

Comment 7 Jonathan Haas 2019-09-27 08:09:26 UTC
Created attachment 1619932 [details]
hwinfo --short

I haven't set nomodeset or any other special kernel parameters. Attaching hwinfo output just in case, but it's a pretty normal (not super new) Intel system with integrated graphics. Flicker free boot doesn't seem to be supported by my bios for some reason, but that probably can't be related?

Comment 8 František Zatloukal 2019-09-27 08:24:20 UTC
(In reply to J. Haas from comment #7)
> Created attachment 1619932 [details]
> hwinfo --short
> 
> I haven't set nomodeset or any other special kernel parameters. Attaching
> hwinfo output just in case, but it's a pretty normal (not super new) Intel
> system with integrated graphics. Flicker free boot doesn't seem to be
> supported by my bios for some reason, but that probably can't be related?

If I am not mistaken, Flicker free boot is available only for Skylake or newer generation (Core iX - 6xxx or higher). So that shouldn't be related.

Comment 9 František Zatloukal 2019-09-27 08:34:48 UTC
Can you try to install xorg-x11-drv-intel and reboot?

Comment 10 Jonathan Haas 2019-09-27 08:38:29 UTC
> Can you try to install xorg-x11-drv-intel and reboot?

That package already is installed.

Comment 11 František Zatloukal 2019-09-27 08:53:22 UTC
(In reply to J. Haas from comment #10)
> > Can you try to install xorg-x11-drv-intel and reboot?
> 
> That package already is installed.

Okay, in that case... can you try to remove that package? It might sound weird, but Intel started using generic xorg driver for recent GPU generations, but apparently stayed with -drv-intel for older ones (or, that package could have remained installed by accident if you upgraded). 

Removing that package should force xorg server to pick the generic driver even for old generation gpu which could work better. If things go south, you might need to install that back from VT or lower runlevel, just to warn you it can break :)

Comment 12 Jonathan Haas 2019-09-27 09:06:00 UTC
I've removed that package and did 5 successful reboots and logins into an Xorg session. So that might have fixed it. Can't be sure of course, as it's not always happening. Graphics are looking completely fine and normal, no issues so far.

Yes, that system is upgraded from originally Fedora 28 I believe. Should something be done to remove that package on upgrade?

Comment 13 František Zatloukal 2019-09-27 13:33:38 UTC
Please, can any of xorg stack maintainers comment on situation with xorg-x11-drv-intel ? Is it supposed to be installed? I don't have it on my laptop, it seems to be causing some issues if it is present. 

Can generic xorg driver handle also older Intel GPUs? Shouldn't we consider adding xorg-x11-drv-intel to fedora-obsolete-pacakges in that case?

Thanks!

Comment 14 Adam Williamson 2019-09-27 17:02:01 UTC
the package has not been retired and is still in the base-x group, so this isn't just an upgrade thing, it is actually still installed on fresh installs as well. Presumably it's still needed for some hardware.

I'm not sure exactly how the detection works here, but however it does - if this adapter (PCI ID 8086:0412) works better with modesetting, perhaps we should switch it.

ajax, wdyt?

Comment 15 Jonathan Haas 2019-09-28 08:11:40 UTC
František, maybe run 

dnf history list xorg-x11-drv-intel

to see why you don't have it on your laptop? Did it get removed accidentally?

But if that package is supposed to be there for everybody, and if that package is indeed causing problems, I would assume more people have problems with this.

Comment 16 František Zatloukal 2019-09-28 09:33:19 UTC
(In reply to J. Haas from comment #15)
> dnf history list xorg-x11-drv-intel
> 

Yeah, looks like I've removed that package. Anyway, it doesn't make too much difference on newer hardware (my laptop GPU is Kaby Lake gen) because even if the package is installed, it's not being used on anything newer than (including) SKL if I remember it correctly.

Comment 17 Adam Williamson 2019-09-28 19:10:49 UTC
J. - so far all we know for sure is that *your* system has a problem starting X with xorg-x11-drv-intel. We don't have enough data to know if it's a general issue or not. We'd need to test a lot more adapters to be sure. It is possible that just the adapter you have has an issue.

I have a couple of systems with Intel graphics I'll test on soon.

Comment 18 Geoffrey Marr 2019-10-01 00:50:06 UTC
Discussed during the 2019-09-30 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as we can only be sure one system has a problem here; we need to test more systems to assess blocker and FE status.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-09-30/f31-blocker-review.2019-09-30-16.00.txt

Comment 19 Adam Williamson 2019-10-01 01:06:50 UTC
BTW, another question here: does KDE behave the same, if you test with a KDE live image, like https://kojipkgs.fedoraproject.org/compose/branched/Fedora-31-20190930.n.1/compose/Spins/x86_64/iso/Fedora-KDE-Live-x86_64-31-20190930.n.1.iso ?

Comment 20 Felix Miata 2019-10-01 06:56:15 UTC
Seems like this may be what happened to me on a fresh NET minimal install to which on first boot I added IceWM and Plasma and removed openbox.  First two sessions of Plasma went OK, using startx from multi-user, but the third and subsequent from graphical/lightdm never finished opening, only painted the desktop background and mouse pointer, Ctrl-Alt-BS or Ctrl-Alt-Fn needed to escape, even after isolating back to multi-user and trying startx /usr/bin/startkde. Trying to restart produced running stop job waiting 90s. After reboot into multi-user,  startx /usr/bin/startkde worked as expected. After another reboot, and beyond, into graphical/lightdm, logging into Plasma session behaves as expected each time.

# rpm -qa | grep intel
xorg-x11-drv-intel-2.99.917-43.20180618.fc31.x86_64
# inxi -V | head -n1
inxi 3.0.36-00 (2019-08-14)
# inxi -GxxSMza
System:    Host: gx780.ij.net Kernel: 5.3.1-300.fc31.x86_64 x86_64 bits: 64 compiler: gcc v: 9.2.1
           parameters: ro root=/dev/sda## ipv6.disable=1 net.ifnames=0 audit=0 plymouth.enable=0 noresume mitigations=auto
           consoleblank=0 selinux=0 vga=791 video=1024x768@60 video=1440x900@60 3
           Desktop: KDE Plasma 5.16.4 tk: Qt 5.12.4 wm: kwin_x11 dm: LightDM Distro: Fedora release 31 (Thirty One)
Machine:   Type: Desktop System: Dell product: OptiPlex 780 v: N/A serial: <filter> Chassis: type: 15 serial: <filter>
           Mobo: Dell model: 03NVJ6 v: A01 serial: <filter> BIOS: Dell v: A15 date: 08/06/2013
Graphics:  Device-1: Intel 4 Series Integrated Graphics vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:2e12
           Display: server: Fedora Project X.org 1.20.5 driver: modesetting unloaded: fbdev,vesa compositor: kwin_x11
           resolution: 1920x1200~60Hz
           OpenGL: renderer: Mesa DRI Intel Q45/Q43 v: 2.1 Mesa 19.2.0 direct render: Yes

FWIW, all my Intel Graphics PCs, all multi- multiboot, that are supported by the modesetting DDX, are using it, Eaglelake, Haswell and Kaby Lake. The upstream xf86-video-intel DDX hasn't had an official release in over 4 years. IMO Intel's driver writers have obviously been focused on the modesetting DDX since that time.

Comment 21 Alessio 2019-10-01 07:26:12 UTC
Answering to the call for testing by adamwill on the test ml.

F31 fully updated installation.

Toshiba Portege R930
00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09)
Successful login


Dell XPS 13 9360
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02)
Successful login

Comment 22 Jonathan Haas 2019-10-01 07:49:34 UTC
I've tried to reproduce this to get a backtrace now that Bug 1748145 is fixed, but I couldn't reproduce it with latest updates applied (and obviously xorg-x11-drv-intel reinstalled). 

I also noticed that I might not have an up-to-date system when I originally reported this, as offline updates were silently not working (Bug 1751103).

I suppose this problem might have been fixed somewhere between beta release an now, but I'm not sure, so I'll leave it up to you if you want to close this issue or if you think further testing makes sense.

I will definitely report back if it happens again.

Comment 23 Ankur Sinha (FranciscoD) 2019-10-01 11:34:57 UTC
Hello,

Up to date F31 workstation install here. It seems to work fine. I tried
with my account and a new user account as well:

$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 (Whiskey Lake) [8086:3ea0]

Comment 24 Adam Williamson 2019-10-01 15:27:45 UTC
Thanks everyone!

Given no-one else hit the problem and it also seems to be working for J. Haas now, let's close this for now, of course re-open if the problem shows up again.

Comment 25 Felix Miata 2019-10-01 16:05:27 UTC
I did a minimal net installation, this time on Kaby Lake, to which I added IceWM & Plasma. It worked fine through several boots and Plasma sessions, then began hanging after the session start splash disappeared. IceWM still worked, which I started and stopped several times, after which Plasma started working again.

This section of journal looks like might be relevant:
Oct 01 11:22:03 ab250 systemd[752]: Started dbus-:1.2-org.kde.kded5.
Oct 01 11:22:03 ab250 systemd[1]: Started dbus-:1.2-org.kde.powerdevil.discretegpuhelper.
Oct 01 11:22:03 ab250 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dbus-:1.2-org.kde.powerdevil.discretegpuhelper@3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 01 11:22:03 ab250 systemd[1]: Started dbus-:1.2-org.kde.powerdevil.backlighthelper.
Oct 01 11:22:03 ab250 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dbus-:1.2-org.kde.powerdevil.backlighthelper@3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 01 11:22:03 ab250 backlighthelper[25021]: powerdevil: no kernel backlight interface found
Oct 01 11:22:12 ab250 systemd[1]: dbus-:1.2-org.kde.powerdevil.discretegpuhelper: Succeeded.
Oct 01 11:22:12 ab250 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dbus-:1.2-org.kde.powerdevil.discretegpuhelper@3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 01 11:22:13 ab250 systemd[1]: dbus-:1.2-org.kde.powerdevil.backlighthelper: Succeeded.
Oct 01 11:22:13 ab250 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dbus-:1.2-org.kde.powerdevil.backlighthelper@3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 01 11:22:52 ab250 kactivitymanagerd[24979]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
Oct 01 11:23:17 ab250 kded5[24987]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
Oct 01 11:25:49 ab250 systemd[752]: Started dbus-:1.2-org.kde.KScreen.

# hwinfo --monitor
15: None 00.0: 10002 LCD Monitor
  [Created at monitor.125]
  Unique ID: rdCR.3MIvMvDbu8E
  Parent ID: _Znp.1+PvSl89UWD
  Hardware Class: monitor
  Model: "NEC EA243WM"
  Vendor: NEC "NEC"
  Device: eisa 0x6865 "EA243WM"
  Serial ID: "1Z101367NA"
  Resolution: 640x480@60Hz
  Resolution: 800x600@56Hz
  Resolution: 800x600@60Hz
  Resolution: 1024x768@60Hz
  Resolution: 1280x960@60Hz
  Resolution: 1280x1024@60Hz
  Resolution: 1280x720@60Hz
  Resolution: 1920x1200@60Hz
  Size: 519x324 mm
  Year of Manufacture: 2011
  Week of Manufacture: 48
  Detailed Timings #0:
     Resolution: 1920x1200
     Horizontal: 1920 1968 2000 2080 (+48 +80 +160) -hsync
       Vertical: 1200 1203 1209 1235 (+3 +9 +35) +vsync
    Frequencies: 154.00 MHz, 74.04 kHz, 59.95 Hz
  Driver Info #0:
    Max. Resolution: 1920x1200
    Vert. Sync Range: 56-61 Hz
    Hor. Sync Range: 31-77 kHz
    Bandwidth: 154 MHz
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #12 (VGA compatible controller)
# rpm -qa | grep mesa
mesa-libglapi-19.2.0-1.fc31.x86_64
mesa-libGL-19.2.0-1.fc31.x86_64
mesa-libgbm-19.2.0-1.fc31.x86_64
mesa-libEGL-19.2.0-1.fc31.x86_64
# inxi -GxxSMza
System:    Host: ab250 Kernel: 5.3.1-300.fc31.x86_64 x86_64 bits: 64 compiler: gcc v: 9.2.1
           parameters: BOOT_IMAGE=/boot/vmlinuz root=LABEL=m12p15f31 noresume ipv6.disable=1 net.ifnames=0 mitigations=auto
           consoleblank=0 video=1024x768@60 video=1400x900@60 3 selinux=0
           Desktop: IceWM 1.6.1 wm: kwin_x11 dm: startx Distro: Fedora release 31 (Thirty One)
Machine:   Type: Desktop Mobo: ASUSTeK model: PRIME B250M-C v: Rev X.0x serial: <filter> UEFI: American Megatrends v: 1402
           date: 11/16/2018
Graphics:  Device-1: Intel HD Graphics 630 vendor: ASUSTeK driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:5912
           Display: server: Fedora Project X.org 1.20.5 driver: modesetting alternate: fbdev,vesa compositor: kwin_x11
           resolution: 1920x1200~60Hz
           OpenGL: renderer: N/A v: N/A direct render: N/A