Bug 1635747

Summary: Switching users in GNOME session is starting new X servers for the user, and a user logout is making it unusable.
Product: Red Hat Enterprise Linux 7 Reporter: Ray Strode [halfline] <rstrode>
Component: xorg-x11-serverAssignee: Adam Jackson <ajax>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: ajax, amike, ayadav, bgollahe, chorn, csoriano, desktop-qa-list, jkoten, jsolomon, mkrajnak, rstrode, salmy, toneata, tpelka, yoguma, yzheng
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: xorg-x11-server-1.20.1-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1489977
: 1637651 1640918 (view as bug list) Environment:
Last Closed: 2019-08-06 12:41:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1489977, 1636460    
Bug Blocks: 1571842, 1588877, 1597339, 1607454, 1632807, 1637651, 1640918, 1707454    
Attachments:
Description Flags
backtrace
none
Xorg.1.log none

Description Ray Strode [halfline] 2018-10-03 15:09:38 UTC
+++ This bug was initially created as a clone of Bug #1489977 +++
--- Additional comment from amit yadav on 2018-09-27 13:29:45 EDT ---

(In reply to Tomas Pelka from comment #17)
> (In reply to Ray Strode [halfline] from comment #16)
> > this works fine for me with:
> > 
> > xorg-x11-server-Xorg-1.20.1-2.el7
> > and
> > gdm-3.28.2-9.el7
> > 
> > can you retry with at least those versions? You might be hitting bug 1632807
> 
> Note that we need the feedback asap, I need to decide what to do with this
> bz tomorrow.

Sorry for delay in response. I was on leave so I couldn't update the bug.

I just tried it on my test system with latest packages. The issue is still reproducible.

journal logs and Xorg.0.log files attached. 

time: Sep 27 13:28

--- Additional comment from Ray Strode [halfline] on 2018-09-28 06:34:30 EDT ---

hmm i just noticed youre using qxl, let me see if i can reproduce with that.  youre getting permission denied not einval so maybe a different issue. hopefully i can reproduce when i get to work

--- Additional comment from Ray Strode [halfline] on 2018-09-28 17:00:13 EDT ---

i was able to reproduce with qxl, still investigating. i won't have an update until late monday probably.

--- Additional comment from Ray Strode [halfline] on 2018-10-01 16:58:10 EDT ---

i hit a number of problems today that prevented me from being able to fully investigate this issue.

The problem seems to be a kernel issue (drm master ownership, potentially related to atomic modesetting patches, see bug 1632807).

I hope to have a clearer picture tomorrow, once my kernel build finishes and i can add some debugging calls to the drm module.

--- Additional comment from Ray Strode [halfline] on 2018-10-03 10:35 EDT ---

So, this doesn't seem to be a duplicate of bug 1632807 after all.

I instrumented the kernel to highlight when master was getting dropped and set and we were clearly failing to call drop master in the existing X server before doing set master in the switched X server. master ownership is important for being able to set the screen resolution and display contents on screen.

The problem is that when the X server is killed it does a VT switch back to the VT it was started from.  Normally switching to another VT leads to master getting dropped, but in this termination case, they don't, since the event loop is finished before the VT switch is initiated.

Most drivers workaround this problem by having this sort of code in their CloseScreen hook:

if (pScrn->LeaveVT)
     pScrn->LeaveVT (...);

This LeaveVT call, simulates the leave vt event that would come in if the event loop were still active.

I say most drivers, but not all.  The QXL driver fails to implement this workaround.  The above patch fixes the issue for me.

Doing a quick search, though, the vmware driver also fails to call LeaveVT from CloseScreen, so I think, maybe, a better approach would be to call the LeaveVT hook from somewhere more generic like xf86CrtcCloseScreen or maybe a new xf86 ddx CloseScreen hook.

--- Additional comment from Ray Strode [halfline] on 2018-10-03 10:38:19 EDT ---

btw, i think we should move this bug back to VERIFIED, and clone for the qxl issue, since it really is separate, so I'll do that shortly.

Comment 1 Ray Strode [halfline] 2018-10-03 15:17:34 UTC
so one problem with attachment 1490121 [details] is LeaveVT is called after CloseScreen.  This isn't necessarily a "safe" thing to do, since CloseScreen can potentially change the LeaveVT handler.  It would be better to call it before CloseScreen.

Still, I've discussed this problem with Adam somewhat, and he believes this can be fixed centrally in the xserver, so attachment 1490121 [details] probably won't be needed, regardless.

Comment 5 Jiri Koten 2018-10-17 14:57:47 UTC
Created attachment 1494891 [details]
backtrace

Comment 6 Jiri Koten 2018-10-17 14:58:52 UTC
Created attachment 1494892 [details]
Xorg.1.log

Comment 7 Jiri Koten 2018-10-17 15:01:28 UTC
Pkgs tested:

xorg-x11-drv-qxl-0.1.5-4.el7.1
xorg-x11-server-Xorg-1.20.1-5.el7
kernel-3.10.0-957.el7
gdm-3.28.2-10.el7

Comment 11 errata-xmlrpc 2019-08-06 12:41:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2079