Bug 1469755 - gdm restarts X without waiting for previous generation to terminate
gdm restarts X without waiting for previous generation to terminate
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: gdm (Show other bugs)
7.4
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Ray Strode [halfline]
Desktop QE
: Regression, ZStream
Depends On:
Blocks: 1438583 1462319 1522983 1470340
  Show dependency treegraph
 
Reported: 2017-07-11 15:03 EDT by Greg Hughes
Modified: 2017-12-06 17:24 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, due to a race condition in logout handling, the GNOME Display Manager (GDM) in some cases started one X server while another was shutting down. Consequently, the second X server failed to start. With this update, GDM waits for the first X server to fully quit before GDM starts the second X server, which prevents the described problem from occurring.
Story Points: ---
Clone Of:
: 1470340 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
explicitly kill and wait for X server (3.16 KB, patch)
2017-07-12 11:05 EDT, Ray Strode [halfline]
no flags Details | Diff
Full GDM debug log (316.77 KB, text/plain)
2017-07-12 12:05 EDT, Greg Hughes
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3235851 None None None 2017-11-08 13:31 EST

  None (edit)
Description Greg Hughes 2017-07-11 15:03:28 EDT
Description of problem:

gdm restarts the X server after user logout without waiting for the current generation of the X server to terminate.  This is mostly OK, the previous generation is terminated within a short window that is normally covered by the start up time and retry loop creating the .X0-lock file.  If, however the old X server takes a couple of seconds to terminate, the new instance fails to create the lock file and aborts.  The desktop is unusable when this occurs.

In the log below, note that the old server (PID 1532) is stopped after starting the new instance (PID 2681).  PID 2681 aborts because PID 1532 takes a couple of seconds to exit (closing the HP RG extension in this instance).

Debug logging from gdm:

Jul 10 13:50:41 localhost gdm: GdmDisplay: prepare display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: display status changed: 1
Jul 10 13:50:41 localhost gdm: GdmServer: Starting X server process: /usr/bin/X :0 -background none -noreset -audit 4 -verbose -logverbose 7 -core -auth /run/gdm/auth-for-gdm-KuTpY2/database -seat seat0 -nolisten tcp vt1
Jul 10 13:50:41 localhost gdm: GdmServer: Opening logfile for server /var/log/gdm/:0.log
Jul 10 13:50:41 localhost polkitd[727]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.52, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jul 10 13:50:41 localhost gdm: GdmServer: Started X server process 2681 - waiting for READY
Jul 10 13:50:41 localhost gdm: GdmDisplay: Started X server
Jul 10 13:50:41 localhost gdm: GdmDisplay: Disposing display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: Display 0x5601aa308300 disposed
Jul 10 13:50:41 localhost gdm: GdmServer: Stopping server
Jul 10 13:50:41 localhost gdm: GdmCommon: sending signal 15 to process 1532
Jul 10 13:50:41 localhost gdm: GdmServer: Waiting on process 1532
Jul 10 13:50:41 localhost abrt-hook-ccpp: Process 2681 (Xorg) of user 0 killed by SIGABRT - dumping core
J

Version-Release number of selected component (if applicable):

gdm-3.22.3-11.el7.x86_64

How reproducible:

Log in / Log out of the desktop using an X server modified to wait for a second before exiting.

Steps to Reproduce:
1. Modify X server to deal exit by 1-2 seconds
2. Log in 
3. Log out

Actual results:

New instance of the X server crashes with SIGABRY because it cannot create the .X0-lock file.

Expected results:

X restarts and greeter is presented.

Additional info:
Comment 2 Greg Hughes 2017-07-11 17:09:39 EDT
This has behaved as expected in previous releases up to, and including 7.3 (gdm 3.14.2)
Comment 4 Ray Strode [halfline] 2017-07-12 08:40:40 EDT
Hi,

Thanks for the troubleshooting and analysis. It's very helpful. I'll try to reproduce, too, but in the meantime, can you attach the full log from comment 0? Your snippet stops right at the point where the second X server is started, but it would be interesting to see the bits of log leading up to the decision to start it.
Comment 5 Ray Strode [halfline] 2017-07-12 11:05 EDT
Created attachment 1297044 [details]
explicitly kill and wait for X server

I was able to reproduce.  The problem is, indeed, that we don't wait for the X server to shutdown.  This patch fixes it by explicitly terminating the X server and then waiting ont he process.
Comment 6 Jeff Burrell 2017-07-12 11:19:00 EDT
Ray,

Is there runway left to have this patch added/accepted into the 7.4 GA?

Jeff
Comment 7 Greg Hughes 2017-07-12 12:05 EDT
Created attachment 1297112 [details]
Full GDM debug log

Full debug log, from which the snippet in the original report originated.  I realize it no longer seems to be required.  Included for completeness.
Comment 14 Ray Strode [halfline] 2017-07-12 16:20:49 EDT
Jeff, a fix that addresses this issue is unlikely to make 7.4 GA, though we should be able to provide an asynchronous update through the 7.4 Z-Stream. We're tentatively targeting either an asynchronous update released the same day as the GA release, or possibly an asynchronous update released a little later, in the the first batch of Z-Stream updates following release (batch 1).
Comment 15 Jeff Burrell 2017-07-12 17:15:16 EDT
Thanks Ray.  I assumed that what you describe would be the situation.  I just needed to know for sure, because the timing of the z-stream release is critical to our Remote Graphics Software(RGS) product.  Without the fix, we'll have to tell our RGS RHEL 7 customers that they could not use RGS with 7.4.

Having this fix in the day-1 z-stream would definitely be preferable.  Is there a way for HPI to influence the decision to include it sooner rather than later?

Jeff
Comment 16 Tomas Pelka 2017-07-13 02:08:36 EDT
(In reply to Jeff Burrell from comment #15)
> Thanks Ray.  I assumed that what you describe would be the situation.  I
> just needed to know for sure, because the timing of the z-stream release is
> critical to our Remote Graphics Software(RGS) product.  Without the fix,
> we'll have to tell our RGS RHEL 7 customers that they could not use RGS with
> 7.4.
> 
> Having this fix in the day-1 z-stream would definitely be preferable.  Is
> there a way for HPI to influence the decision to include it sooner rather
> than later?
> 
> Jeff

So the erratum is a 0-day one so it will be released at the same day as RHEL7.4 GA. As Ray prepared the build I believe we (QE) can start immediately.
Comment 18 Jeff Burrell 2017-07-13 10:34:57 EDT
That's great.  Thanks Tomas!  We would volunteer to test the errata as soon as it's available as well.  Let me know...
Comment 19 Tomas Pelka 2017-07-13 11:01:43 EDT
That would be actually great, Joe would you mind providing builds to HP.

Note You need to log in before you can comment on or make changes to this bug.