Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Previously, due to a race condition in logout handling, the GNOME Display Manager (GDM) in some cases started one X server while another was shutting down. Consequently, the second X server failed to start. With this update, GDM waits for the first X server to fully quit before GDM starts the second X server, which prevents the described problem from occurring.
Description of problem:
gdm restarts the X server after user logout without waiting for the current generation of the X server to terminate. This is mostly OK, the previous generation is terminated within a short window that is normally covered by the start up time and retry loop creating the .X0-lock file. If, however the old X server takes a couple of seconds to terminate, the new instance fails to create the lock file and aborts. The desktop is unusable when this occurs.
In the log below, note that the old server (PID 1532) is stopped after starting the new instance (PID 2681). PID 2681 aborts because PID 1532 takes a couple of seconds to exit (closing the HP RG extension in this instance).
Debug logging from gdm:
Jul 10 13:50:41 localhost gdm: GdmDisplay: prepare display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: display status changed: 1
Jul 10 13:50:41 localhost gdm: GdmServer: Starting X server process: /usr/bin/X :0 -background none -noreset -audit 4 -verbose -logverbose 7 -core -auth /run/gdm/auth-for-gdm-KuTpY2/database -seat seat0 -nolisten tcp vt1
Jul 10 13:50:41 localhost gdm: GdmServer: Opening logfile for server /var/log/gdm/:0.log
Jul 10 13:50:41 localhost polkitd[727]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.52, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jul 10 13:50:41 localhost gdm: GdmServer: Started X server process 2681 - waiting for READY
Jul 10 13:50:41 localhost gdm: GdmDisplay: Started X server
Jul 10 13:50:41 localhost gdm: GdmDisplay: Disposing display
Jul 10 13:50:41 localhost gdm: GdmLocalDisplayFactory: Display 0x5601aa308300 disposed
Jul 10 13:50:41 localhost gdm: GdmServer: Stopping server
Jul 10 13:50:41 localhost gdm: GdmCommon: sending signal 15 to process 1532
Jul 10 13:50:41 localhost gdm: GdmServer: Waiting on process 1532
Jul 10 13:50:41 localhost abrt-hook-ccpp: Process 2681 (Xorg) of user 0 killed by SIGABRT - dumping core
J
Version-Release number of selected component (if applicable):
gdm-3.22.3-11.el7.x86_64
How reproducible:
Log in / Log out of the desktop using an X server modified to wait for a second before exiting.
Steps to Reproduce:
1. Modify X server to deal exit by 1-2 seconds
2. Log in
3. Log out
Actual results:
New instance of the X server crashes with SIGABRY because it cannot create the .X0-lock file.
Expected results:
X restarts and greeter is presented.
Additional info:
This has behaved as expected in previous releases up to, and including 7.3 (gdm 3.14.2)
Comment 4Ray Strode [halfline]
2017-07-12 12:40:40 UTC
Hi,
Thanks for the troubleshooting and analysis. It's very helpful. I'll try to reproduce, too, but in the meantime, can you attach the full log from comment 0? Your snippet stops right at the point where the second X server is started, but it would be interesting to see the bits of log leading up to the decision to start it.
Comment 5Ray Strode [halfline]
2017-07-12 15:05:38 UTC
Created attachment 1297044[details]
explicitly kill and wait for X server
I was able to reproduce. The problem is, indeed, that we don't wait for the X server to shutdown. This patch fixes it by explicitly terminating the X server and then waiting ont he process.
Created attachment 1297112[details]
Full GDM debug log
Full debug log, from which the snippet in the original report originated. I realize it no longer seems to be required. Included for completeness.
Comment 14Ray Strode [halfline]
2017-07-12 20:20:49 UTC
Jeff, a fix that addresses this issue is unlikely to make 7.4 GA, though we should be able to provide an asynchronous update through the 7.4 Z-Stream. We're tentatively targeting either an asynchronous update released the same day as the GA release, or possibly an asynchronous update released a little later, in the the first batch of Z-Stream updates following release (batch 1).
Thanks Ray. I assumed that what you describe would be the situation. I just needed to know for sure, because the timing of the z-stream release is critical to our Remote Graphics Software(RGS) product. Without the fix, we'll have to tell our RGS RHEL 7 customers that they could not use RGS with 7.4.
Having this fix in the day-1 z-stream would definitely be preferable. Is there a way for HPI to influence the decision to include it sooner rather than later?
Jeff
(In reply to Jeff Burrell from comment #15)
> Thanks Ray. I assumed that what you describe would be the situation. I
> just needed to know for sure, because the timing of the z-stream release is
> critical to our Remote Graphics Software(RGS) product. Without the fix,
> we'll have to tell our RGS RHEL 7 customers that they could not use RGS with
> 7.4.
>
> Having this fix in the day-1 z-stream would definitely be preferable. Is
> there a way for HPI to influence the decision to include it sooner rather
> than later?
>
> Jeff
So the erratum is a 0-day one so it will be released at the same day as RHEL7.4 GA. As Ray prepared the build I believe we (QE) can start immediately.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2018:0770