Bug 1046410

Summary: X APPLICATION CRASHES AFTER SOME TIME ON RHEL6 X86 WITH LIBXCB 1.8.1
Product: Red Hat Enterprise Linux 6 Reporter: Luan Jianhai <jianhai.luan>
Component: libxcbAssignee: Olivier Fourdan <ofourdan>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.5CC: ashishks, dave.kinsell, jherrman, joe.jin, ofourdan, thatsafunnyname, tlavigne, tpelka
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: libxcb-1.9.1-3.el6 Doc Type: Bug Fix
Doc Text:
On 32-bit architectures, an X11 protocol client was under certain circumstances disconnected after processing a large number of X11 requests. With this update, the libxcb library exposes the request sequence number as a 64-bit integer so that libX11 can make use of 64-bit sequence number even on 32-bit systems. As a result, the described failure of the X11 client no longer occurs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 07:02:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Patch of fixing libX11 uint_64 none

Description Luan Jianhai 2013-12-25 06:56:25 UTC
Description of problem:
  An X application crashes after many hours of drawing
Customer has an application that does a lot of XDrawString and XDrawLine.
After several hours the application is exited by an XIOError.

ANALYSIS AND RESEARCH
---------------------
The XIOError is called in libX11 in the file xcb_io.c, function _XReply.
It does not get a response from xcb_wait_for_reply.

libxcb 1.5 is fine, libxcb 1.8.1 is not.
Bisecting libxcb points to this commit:

 commit ed37b087519ecb9e74412e4df8f8a217ab6d12a9
 Author: Jamey Sharp <jamey@minilop.net>
 Date:   Sat Oct 9 17:13:45 2010 -0700

   xcb_in: Use 64-bit sequence numbers internally everywhere.
  
   Widen sequence numbers on entry to those public APIs that still take
   32-bit sequence numbers.
  
   Signed-off-by: Jamey Sharp <jamey@minilop.net>

Reverting it on top of 1.8.1 helps.

Upon adding traces to libxcb customer found that the last request numbers
used for xcb_wait_for_reply are these: 4294900463 and 4294965487 (two calls
in the

while loop of the _XReply function), half a second later: 63215 (then
XIOError is called).
The widen_request is also 63215, I would have expected 63215+2^32.
Therefore it seems that the request is not correctly widened.

The commit above also changed the compares in poll_for_reply from
XCB_SEQUENCE_COMPARE_32 to XCB_SEQUENCE_COMPARE.
Maybe the widening never worked correctly, but it was never observed, because
only the lower 32bits were compared.

The bug is also opened on freedesktop.org:
@ https://bugs.freedesktop.org/show_bug.cgi?id=71338

Version-Release number of selected component (if applicable):
  libX11 libX11-1.5.0


How reproducible:

Steps to Reproduce:
1. Down testcase of https://bugs.freedesktop.org/attachment.cgi?id=88996 
2. Compile the testcase
3. Run it

Actual results:
  ERROR Received a X IO error on display=8073008.
  backtrace() returned 10 addresses
  ./xdrawnew() [0x8048813]
  ./xdrawnew() [0x80488b4]
   /usr/lib/libX11.so.6(_XIOError+0x57) [0xf76217b7]
   /usr/lib/libX11.so.6(_XReply+0x3d5) [0xf7620075]
   /usr/lib/libX11.so.6(+0x3c36f) [0xf762236f]
   /usr/lib/libX11.so.6(+0x3c485) [0xf7622485]
   /usr/lib/libX11.so.6(XNoOp+0x5a) [0xf760ddca]
   /xdrawnew() [0x8048ac5]
   /lib/libc.so.6(__libc_start_main+0xe6) [0xf7465d26]
  ./xdrawnew() [0x8048761]

Expected results:
  No Error 

Additional info:
  - bug: .......... https://bugs.freedesktop.org/show_bug.cgi?id=71338
  - testcase: ..... https://bugs.freedesktop.org/attachment.cgi?id=88996
  - proposed patch: https://bugs.freedesktop.org/attachment.cgi?id=89001
  - discussion: http://lists.x.org/archives/xorg-devel/2013-October/038370.html.

Comment 1 Luan Jianhai 2013-12-25 07:04:23 UTC
Created attachment 841403 [details]
Patch of fixing libX11 uint_64

Comment 2 Luan Jianhai 2013-12-25 07:08:34 UTC
  As xorg-devel discussion, the attachment should have fix the issue. Would you like merge the patch into latest RHEL?

Comment 5 Luan Jianhai 2014-06-17 08:08:59 UTC
Would you like to give me some respond about the issue. If the patch don't fix the isssue, do you have good advice about the issue.

Comment 11 dave.kinsell 2015-03-02 21:56:04 UTC
Defect still present in RHEL6u6 and RHEL7u0 (libxcb 1.9-5).  Also occurs on apps compiled as 32 bits on 64 bit systems.  Can see the failure in less than 5 minutes of run time, under 'best case' conditions.

Comment 13 Olivier Fourdan 2015-03-26 15:34:16 UTC
(In reply to dave.kinsell from comment #11)
> Defect still present in RHEL6u6 and RHEL7u0 (libxcb 1.9-5).  Also occurs on
> apps compiled as 32 bits on 64 bit systems.  Can see the failure in less
> than 5 minutes of run time, under 'best case' conditions.

A patch to address this issue is currently under review upstream.

Note that X IO errors can have multiple causes, including bugs in the program itself. Reaching the failure in less than 5 minutes sounds surprising, it means you reach the 32bit sequence number limit in less than 5 minutes. To give you an idea, it takes me roughly 5 hours to reach that limit in a VM using the reproducer program, which draws a line continuously.

Comment 14 dave.kinsell 2015-03-26 16:36:48 UTC
Thank you Olivier, so nice to hear this may get a patch from upstream.

The 5 minutes to failure is done with rapid XNoOp() calls, as discussed in https://bugs.freedesktop.org/show_bug.cgi?id=71338

I used my own counter to make sure it was making 2^32 calls before failing.  With a realistic program that we support, it can fail in about 28 hours.

I wanted to clarify this happens with any 32 bit executable, not just on 32 bit systems, because the number of people affected is much larger.

Comment 15 Olivier Fourdan 2015-03-27 10:21:18 UTC
(In reply to dave.kinsell from comment #14)
> The 5 minutes to failure is done with rapid XNoOp() calls, as discussed in
> https://bugs.freedesktop.org/show_bug.cgi?id=71338
> 
> I used my own counter to make sure it was making 2^32 calls before failing. 
> With a realistic program that we support, it can fail in about 28 hours.

OK, thanks for clarifying.
 
> I wanted to clarify this happens with any 32 bit executable, not just on 32
> bit systems, because the number of people affected is much larger.

Yes, correct, 32apps link to 32bit libs and therefore are equally affected, even on a 64bit system - This is why I also cloned these bugs for el7 as well.

Comment 20 errata-xmlrpc 2015-07-22 07:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1358.html