1469755 – gdm restarts X without waiting for previous generation to terminate

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1469755 - gdm restarts X without waiting for previous generation to terminate

Summary: gdm restarts X without waiting for previous generation to terminate

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gdm
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Ray Strode [halfline]
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1438583 1462319 1470340 1522983
TreeView+	depends on / blocked

Reported:	2017-07-11 19:03 UTC by Greg Hughes
Modified:	2020-12-14 09:05 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, due to a race condition in logout handling, the GNOME Display Manager (GDM) in some cases started one X server while another was shutting down. Consequently, the second X server failed to start. With this update, GDM waits for the first X server to fully quit before GDM starts the second X server, which prevents the described problem from occurring.
Clone Of:
Clones:	1470340 (view as bug list)
Environment:
Last Closed:	2018-04-10 12:57:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
explicitly kill and wait for X server (3.16 KB, patch) 2017-07-12 15:05 UTC, Ray Strode [halfline]	no flags	Details \| Diff
Full GDM debug log (316.77 KB, text/plain) 2017-07-12 16:05 UTC, Greg Hughes	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3235851	0	None	None	None	2017-11-08 18:31:03 UTC
Red Hat Product Errata	RHBA-2018:0770	0	None	None	None	2018-04-10 12:59:16 UTC

Description Greg Hughes 2017-07-11 19:03:28 UTC

Description of problem:

gdm restarts the X server after user logout without waiting for the current generation of the X server to terminate.  This is mostly OK, the previous generation is terminated within a short window that is normally covered by the start up time and retry loop creating the .X0-lock file.  If, however the old X server takes a couple of seconds to terminate, the new instance fails to create the lock file and aborts.  The desktop is unusable when this occurs.

In the log below, note that the old server (PID 1532) is stopped after starting the new instance (PID 2681).  PID 2681 aborts because PID 1532 takes a couple of seconds to exit (closing the HP RG extension in this instance).

Debug logging from gdm:

Jul 10 13:50:41 localhost gdm: GdmDisplay: prepare display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: display status changed: 1
Jul 10 13:50:41 localhost gdm: GdmServer: Starting X server process: /usr/bin/X :0 -background none -noreset -audit 4 -verbose -logverbose 7 -core -auth /run/gdm/auth-for-gdm-KuTpY2/database -seat seat0 -nolisten tcp vt1
Jul 10 13:50:41 localhost gdm: GdmServer: Opening logfile for server /var/log/gdm/:0.log
Jul 10 13:50:41 localhost polkitd[727]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.52, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jul 10 13:50:41 localhost gdm: GdmServer: Started X server process 2681 - waiting for READY
Jul 10 13:50:41 localhost gdm: GdmDisplay: Started X server
Jul 10 13:50:41 localhost gdm: GdmDisplay: Disposing display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: Display 0x5601aa308300 disposed
Jul 10 13:50:41 localhost gdm: GdmServer: Stopping server
Jul 10 13:50:41 localhost gdm: GdmCommon: sending signal 15 to process 1532
Jul 10 13:50:41 localhost gdm: GdmServer: Waiting on process 1532
Jul 10 13:50:41 localhost abrt-hook-ccpp: Process 2681 (Xorg) of user 0 killed by SIGABRT - dumping core
J

Version-Release number of selected component (if applicable):

gdm-3.22.3-11.el7.x86_64

How reproducible:

Log in / Log out of the desktop using an X server modified to wait for a second before exiting.

Steps to Reproduce:
1. Modify X server to deal exit by 1-2 seconds
2. Log in 
3. Log out

Actual results:

New instance of the X server crashes with SIGABRY because it cannot create the .X0-lock file.

Expected results:

X restarts and greeter is presented.

Additional info:

Comment 2 Greg Hughes 2017-07-11 21:09:39 UTC

This has behaved as expected in previous releases up to, and including 7.3 (gdm 3.14.2)

Comment 4 Ray Strode [halfline] 2017-07-12 12:40:40 UTC

Hi,

Thanks for the troubleshooting and analysis. It's very helpful. I'll try to reproduce, too, but in the meantime, can you attach the full log from comment 0? Your snippet stops right at the point where the second X server is started, but it would be interesting to see the bits of log leading up to the decision to start it.

Comment 5 Ray Strode [halfline] 2017-07-12 15:05:38 UTC

Created attachment 1297044 [details]
explicitly kill and wait for X server

I was able to reproduce.  The problem is, indeed, that we don't wait for the X server to shutdown.  This patch fixes it by explicitly terminating the X server and then waiting ont he process.

Comment 6 Jeff Burrell 2017-07-12 15:19:00 UTC

Ray,

Is there runway left to have this patch added/accepted into the 7.4 GA?

Jeff

Comment 7 Greg Hughes 2017-07-12 16:05:53 UTC

Created attachment 1297112 [details]
Full GDM debug log

Full debug log, from which the snippet in the original report originated.  I realize it no longer seems to be required.  Included for completeness.

Comment 14 Ray Strode [halfline] 2017-07-12 20:20:49 UTC

Jeff, a fix that addresses this issue is unlikely to make 7.4 GA, though we should be able to provide an asynchronous update through the 7.4 Z-Stream. We're tentatively targeting either an asynchronous update released the same day as the GA release, or possibly an asynchronous update released a little later, in the the first batch of Z-Stream updates following release (batch 1).

Comment 15 Jeff Burrell 2017-07-12 21:15:16 UTC

Thanks Ray.  I assumed that what you describe would be the situation.  I just needed to know for sure, because the timing of the z-stream release is critical to our Remote Graphics Software(RGS) product.  Without the fix, we'll have to tell our RGS RHEL 7 customers that they could not use RGS with 7.4.

Having this fix in the day-1 z-stream would definitely be preferable.  Is there a way for HPI to influence the decision to include it sooner rather than later?

Jeff

Comment 16 Tomas Pelka 2017-07-13 06:08:36 UTC

(In reply to Jeff Burrell from comment #15)
> Thanks Ray.  I assumed that what you describe would be the situation.  I
> just needed to know for sure, because the timing of the z-stream release is
> critical to our Remote Graphics Software(RGS) product.  Without the fix,
> we'll have to tell our RGS RHEL 7 customers that they could not use RGS with
> 7.4.
> 
> Having this fix in the day-1 z-stream would definitely be preferable.  Is
> there a way for HPI to influence the decision to include it sooner rather
> than later?
> 
> Jeff

So the erratum is a 0-day one so it will be released at the same day as RHEL7.4 GA. As Ray prepared the build I believe we (QE) can start immediately.

Comment 18 Jeff Burrell 2017-07-13 14:34:57 UTC

That's great.  Thanks Tomas!  We would volunteer to test the errata as soon as it's available as well.  Let me know...

Comment 19 Tomas Pelka 2017-07-13 15:01:43 UTC

That would be actually great, Joe would you mind providing builds to HP.

Comment 24 errata-xmlrpc 2018-04-10 12:57:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0770

Note You need to log in before you can comment on or make changes to this bug.