Description of problem:
gdm restarts the X server after user logout without waiting for the current generation of the X server to terminate. This is mostly OK, the previous generation is terminated within a short window that is normally covered by the start up time and retry loop creating the .X0-lock file. If, however the old X server takes a couple of seconds to terminate, the new instance fails to create the lock file and aborts. The desktop is unusable when this occurs.
In the log below, note that the old server (PID 1532) is stopped after starting the new instance (PID 2681). PID 2681 aborts because PID 1532 takes a couple of seconds to exit (closing the HP RG extension in this instance).
Debug logging from gdm:
Jul 10 13:50:41 localhost gdm: GdmDisplay: prepare display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: display status changed: 1
Jul 10 13:50:41 localhost gdm: GdmServer: Starting X server process: /usr/bin/X :0 -background none -noreset -audit 4 -verbose -logverbose 7 -core -auth /run/gdm/auth-for-gdm-KuTpY2/database -seat seat0 -nolisten tcp vt1
Jul 10 13:50:41 localhost gdm: GdmServer: Opening logfile for server /var/log/gdm/:0.log
Jul 10 13:50:41 localhost polkitd: Unregistered Authentication Agent for unix-session:1 (system bus name :1.52, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jul 10 13:50:41 localhost gdm: GdmServer: Started X server process 2681 - waiting for READY
Jul 10 13:50:41 localhost gdm: GdmDisplay: Started X server
Jul 10 13:50:41 localhost gdm: GdmDisplay: Disposing display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: Display 0x5601aa308300 disposed
Jul 10 13:50:41 localhost gdm: GdmServer: Stopping server
Jul 10 13:50:41 localhost gdm: GdmCommon: sending signal 15 to process 1532
Jul 10 13:50:41 localhost gdm: GdmServer: Waiting on process 1532
Jul 10 13:50:41 localhost abrt-hook-ccpp: Process 2681 (Xorg) of user 0 killed by SIGABRT - dumping core
Version-Release number of selected component (if applicable):
Log in / Log out of the desktop using an X server modified to wait for a second before exiting.
Steps to Reproduce:
1. Modify X server to deal exit by 1-2 seconds
2. Log in
3. Log out
New instance of the X server crashes with SIGABRY because it cannot create the .X0-lock file.
X restarts and greeter is presented.
This has behaved as expected in previous releases up to, and including 7.3 (gdm 3.14.2)
Thanks for the troubleshooting and analysis. It's very helpful. I'll try to reproduce, too, but in the meantime, can you attach the full log from comment 0? Your snippet stops right at the point where the second X server is started, but it would be interesting to see the bits of log leading up to the decision to start it.
Created attachment 1297044 [details]
explicitly kill and wait for X server
I was able to reproduce. The problem is, indeed, that we don't wait for the X server to shutdown. This patch fixes it by explicitly terminating the X server and then waiting ont he process.
Is there runway left to have this patch added/accepted into the 7.4 GA?
Created attachment 1297112 [details]
Full GDM debug log
Full debug log, from which the snippet in the original report originated. I realize it no longer seems to be required. Included for completeness.
Jeff, a fix that addresses this issue is unlikely to make 7.4 GA, though we should be able to provide an asynchronous update through the 7.4 Z-Stream. We're tentatively targeting either an asynchronous update released the same day as the GA release, or possibly an asynchronous update released a little later, in the the first batch of Z-Stream updates following release (batch 1).
Thanks Ray. I assumed that what you describe would be the situation. I just needed to know for sure, because the timing of the z-stream release is critical to our Remote Graphics Software(RGS) product. Without the fix, we'll have to tell our RGS RHEL 7 customers that they could not use RGS with 7.4.
Having this fix in the day-1 z-stream would definitely be preferable. Is there a way for HPI to influence the decision to include it sooner rather than later?
(In reply to Jeff Burrell from comment #15)
> Thanks Ray. I assumed that what you describe would be the situation. I
> just needed to know for sure, because the timing of the z-stream release is
> critical to our Remote Graphics Software(RGS) product. Without the fix,
> we'll have to tell our RGS RHEL 7 customers that they could not use RGS with
> Having this fix in the day-1 z-stream would definitely be preferable. Is
> there a way for HPI to influence the decision to include it sooner rather
> than later?
So the erratum is a 0-day one so it will be released at the same day as RHEL7.4 GA. As Ray prepared the build I believe we (QE) can start immediately.
That's great. Thanks Tomas! We would volunteer to test the errata as soon as it's available as well. Let me know...
That would be actually great, Joe would you mind providing builds to HP.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.