Bug 1671183

Summary: Legacy GUI element causes X server crash
Product: Red Hat Enterprise Linux 7 Reporter: Louis Wust <louis.wust>
Component: xorg-x11-serverAssignee: Adam Jackson <ajax>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: csoriano, dbasant, dereks, jsolomon, mailbox, mboisver, tgc, tpelka, vanhoof
Target Milestone: rcKeywords: OtherQA, Patch, Regression, Reopened
Target Release: 7.8   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 21:52:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1725628, 1738389    
Attachments:
Description Flags
X server patch which fixes the problem — written and tested by people who aren't X server devs. none

Description Louis Wust 2019-01-31 02:03:58 UTC
Created attachment 1525205 [details]
X server patch which fixes the problem — written and tested by people who aren't X server devs.

We recently upgraded to RHEL 7.6 (from 7.5) and have begun seeing X server crashes. This appears to be caused by a bug in the X Server, as shown in the URL  for the external bug tracker. This is also being tracked in the CentOS bug tracker (see other link).

The problem affects the xorg-x11-server-Xorg-1.20.1-5.2 package. However, our locally-built version does not suffer from the same problem, thanks to the patch posted to the X Server GitLab issue by Sokov V.M.. I think it is fair to claim that neither I nor Sokov V.M. understands all of the possible implications of this patch — *but it works*!

How reproducible:
The problem occurs 100% of the time, but the procedure is very specific to our location. Other people have encountered the same problem (see the linked bug reports) using other procedures, but this is the only one that works 100% reliably for us.

Steps to Reproduce:
1. Log into RHEL 7.6 graphically — both GNOME and KDE are affected equally, so it doesn't matter which is chosen
2. Launch the X2Go remote desktop client (the latest version from EPEL)
3. Connect to a particular SuSE 12.3 server and launch IceWM
4. Run ncview: an application which uses a legacy X11 toolkit (X Athena Widgets or Xaw) rather than a more modern toolkit like GTK+ or Qt
5. Click a certain button in the ncview application

At this point, the entire X11 server crashes, dropping the user to a login screen. Interestingly, due to the session-resume capabilities of X2Go, the ncview application and IceWM desktop continue to run on the remote machine. Obviously, the expectation is that the application, remote DE, and local DE would all have remained alive and functional when the button was pressed.

Can you help me and the others affected by this to understand the problem and the apparent solution — i.e., the patch? And is this something that Red Hat can help to move upstream?

Comment 2 Louis Wust 2019-01-31 02:07:16 UTC
I'm having a bit of trouble using Bugzilla to add the link to the X Server issue on freedesktop.org's GitLab, so here it is:

https://gitlab.freedesktop.org/xorg/xserver/issues/623

Comment 5 Adam Jackson 2019-05-02 20:46:38 UTC
I don't think I can take that patch as-is, I think it ends up being an ABI break. Can you
try this build instead?

https://people.redhat.com/ajackson/1671183/

The above build is against 7.6, but the content (other than one of the patches) is currently
queued for 7.7. If you would prefer to test just this change in isolation, extract
0001-composite-Fix-some-backing-store-bugs-1677719.patch from the linked SRPM and apply it
to your own build.

Comment 6 Tomas Pelka 2019-06-26 09:08:01 UTC
(In reply to Adam Jackson from comment #5)
> I don't think I can take that patch as-is, I think it ends up being an ABI
> break. Can you
> try this build instead?
> 
> https://people.redhat.com/ajackson/1671183/
> 
> The above build is against 7.6, but the content (other than one of the
> patches) is currently
> queued for 7.7. If you would prefer to test just this change in isolation,
> extract
> 0001-composite-Fix-some-backing-store-bugs-1677719.patch from the linked
> SRPM and apply it
> to your own build.

Any progress here?

Comment 8 Louis Wust 2019-06-26 22:33:29 UTC
Thank you for the suggestion and sorry for the delayed response. I have gotten a chance to apply Adam Jackson's patch to our RHEL 7.6 systems. I did this twice: once by rebuilding our existing xorg-x11-server-1.20.1-5.3.el7 (RHEL 7.6) packages with the 0001-composite-Fix-some-backing-store-bugs-1677719.patch, and once by replacing those with the supplied xorg-x11-server-1.20.4-5.el7_6 packages (containing RHEL 7.7 content).

Both tests produced the same results. The bad news is that my test case still crashes the X server, whereas Sokov V.M.'s ABI-breaking patch fixes this problem. The good news is that this crash occurs with a different stacktrace, which suggests that your fixes may have worked for one part of the problem. Here is the new stacktrace (with line #s derived from xorg-x11-server-debuginfo-1.20.4-5.el7_6.x86_64.rpm):

#0  0x00007fd1233592c7 in raise () at /lib64/libc.so.6
#1  0x00007fd12335a9b8 in abort () at /lib64/libc.so.6
#2  0x000055769ea7fe5a in OsAbort () at utils.c:1351
#3  0x000055769ea859f3 in AbortServer () at log.c:879
#4  0x000055769ea8683d in FatalError (f=f@entry=0x55769eab6c90 "Caught signal %d (%s). Server aborting\n") at log.c:1017
#5  0x000055769ea7d0c9 in OsSigHandler (signo=11, sip=<optimized out>, unused=<optimized out>) at osinit.c:156
#6  0x00007fd1236ff5d0 in <signal handler called> () at /lib64/libpthread.so.0
#7  0x000055769ea6a7d3 in miComputeClips (pParent=pParent@entry=0x5576a21ca910, pScreen=pScreen@entry=0x5576a08a5650, universe=universe@entry=0x7ffe815af350, kind=kind@entry=VTUnmap, exposed=exposed@entry=0x7ffe815af370) at mivaltree.c:291
#8  0x000055769ea6b267 in miValidateTree (pParent=0x5576a1ed2880, pChild=<optimized out>, kind=VTUnmap) at mivaltree.c:687
#9  0x000055769e9502f2 in UnmapWindow (pWin=0x5576a21ca910, fromConfigure=fromConfigure@entry=0) at window.c:2881
#10 0x000055769e91e9a4 in ProcUnmapWindow (client=<optimized out>) at dispatch.c:879
#11 0x000055769e92444b in Dispatch () at dispatch.c:478
#12 0x000055769e92849a in dix_main (argc=18, argv=0x7ffe815af5d8, envp=<optimized out>) at main.c:276
#13 0x00007fd123345495 in __libc_start_main () at /lib64/libc.so.6
#14 0x000055769e91258e in _start ()

mivaltree.c:291 looks like this:

dx = pParent->drawable.x - pParent->valdata->before.oldAbsCorner.x;

Of particular interest is the fact that this code accesses an instance of the _Validate union (pParent->valdata->before). I believe that I correctly attributed the root cause of the original issue [1] to inconsistent handling of the _Validate union: code was accessing the ->after "side" of the union despite data having been stored in the ->before "side." Indeed, Sokov V.M.'s approach is an extreme solution to this problem. Could this be another example of this problem?

I am certainly willing to test additional builds of this. I will try to be quicker to reply in the future.

[1] https://gitlab.freedesktop.org/xorg/xserver/issues/623#note_104763

Comment 9 Adam Jackson 2019-08-05 23:48:28 UTC
I've uploaded a test build for 7.7 here:

https://people.redhat.com/public_html/1671183/

Please test and report any problems.

Comment 12 Fredy Paquet 2019-10-05 10:12:17 UTC
Hello Adam, 

Your link shows me an empty page at people.redhat.com
How can we access the test build?

--
I'm also struggling with vinagre, connecting via RDP.
I think it's based on freerdp-libs.
It might be related to this bug here.

Comment 13 Adam Jackson 2019-10-10 14:45:28 UTC
(In reply to Fredy Paquet from comment #12)
> Hello Adam, 
> 
> Your link shows me an empty page at people.redhat.com
> How can we access the test build?

Apologies, that was a typo on my part:

https://people.redhat.com/ajackson/1671183/

Comment 14 Carlos Soriano 2019-10-24 14:23:50 UTC
Hi Louis, did you have a chance to verify the fix? Our deadlines to include the fix in our next release are coming closer, unless we get verification that is working fine we will unfortunately have to drop it.

Comment 15 Louis Wust 2019-11-06 03:52:45 UTC
Thank you for your further updates, and again sorry for the delayed response.

I have not been able to test ajackson's actual "1.20.4-8.jx1.el7_7" packages as of yet, and I will not be able to do so immediately. The system to which I most readily have access uses the Nvidia proprietary drivers, and I believe that these packages were conflicting with them in some way. What I can say is that I have asked a coworker to attempt to reproduce the issue on the original system, using the same steps that I used.

While waiting for this result, I will also point out that I recompiled our locally-maintained X server packages with the 0001-composite-Fix-some-backing-store-bugs-1677719.patch included and with Sokov V.M.'s patch (attached to the original ticket) dropped. Unfortunately, the problem was reintroduced in this configuration.

I can provide additional tracebacks or other information once I have heard from my coworker.

Comment 16 Carlos Soriano 2019-12-06 10:31:06 UTC
Hi Louis,

Thanks for the information. For now we will hold on introducing the fix in the release, as we would need verification whether it works. If in the future you are able to test the package we provided, simply let us know and we will try to introduce the fix in a later release.

Thanks

Comment 20 RHEL Program Management 2019-12-06 10:36:10 UTC
Product Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 22 Carlos Soriano 2020-04-20 13:40:48 UTC
Hi Louis, did you have a chance to test the build Adam provided? Thanks.

Comment 25 Chris Williams 2020-11-11 21:52:45 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7

Comment 26 Red Hat Bugzilla 2023-09-14 04:46:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days