Bug 1046410 - X APPLICATION CRASHES AFTER SOME TIME ON RHEL6 X86 WITH LIBXCB 1.8.1
Summary: X APPLICATION CRASHES AFTER SOME TIME ON RHEL6 X86 WITH LIBXCB 1.8.1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libxcb
Version: 6.5
Hardware: i686
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Olivier Fourdan
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-25 06:56 UTC by Luan Jianhai
Modified: 2019-07-11 07:49 UTC (History)
8 users (show)

Fixed In Version: libxcb-1.9.1-3.el6
Doc Type: Bug Fix
Doc Text:
On 32-bit architectures, an X11 protocol client was under certain circumstances disconnected after processing a large number of X11 requests. With this update, the libxcb library exposes the request sequence number as a 64-bit integer so that libX11 can make use of 64-bit sequence number even on 32-bit systems. As a result, the described failure of the X11 client no longer occurs.
Clone Of:
Environment:
Last Closed: 2015-07-22 07:02:22 UTC


Attachments (Terms of Use)
Patch of fixing libX11 uint_64 (969 bytes, patch)
2013-12-25 07:04 UTC, Luan Jianhai
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1358 normal SHIPPED_LIVE libxcb and libX11 bug fix update 2015-07-20 17:59:26 UTC

Description Luan Jianhai 2013-12-25 06:56:25 UTC
Description of problem:
  An X application crashes after many hours of drawing
Customer has an application that does a lot of XDrawString and XDrawLine.
After several hours the application is exited by an XIOError.

ANALYSIS AND RESEARCH
---------------------
The XIOError is called in libX11 in the file xcb_io.c, function _XReply.
It does not get a response from xcb_wait_for_reply.

libxcb 1.5 is fine, libxcb 1.8.1 is not.
Bisecting libxcb points to this commit:

 commit ed37b087519ecb9e74412e4df8f8a217ab6d12a9
 Author: Jamey Sharp <jamey@minilop.net>
 Date:   Sat Oct 9 17:13:45 2010 -0700

   xcb_in: Use 64-bit sequence numbers internally everywhere.
  
   Widen sequence numbers on entry to those public APIs that still take
   32-bit sequence numbers.
  
   Signed-off-by: Jamey Sharp <jamey@minilop.net>

Reverting it on top of 1.8.1 helps.

Upon adding traces to libxcb customer found that the last request numbers
used for xcb_wait_for_reply are these: 4294900463 and 4294965487 (two calls
in the

while loop of the _XReply function), half a second later: 63215 (then
XIOError is called).
The widen_request is also 63215, I would have expected 63215+2^32.
Therefore it seems that the request is not correctly widened.

The commit above also changed the compares in poll_for_reply from
XCB_SEQUENCE_COMPARE_32 to XCB_SEQUENCE_COMPARE.
Maybe the widening never worked correctly, but it was never observed, because
only the lower 32bits were compared.

The bug is also opened on freedesktop.org:
@ https://bugs.freedesktop.org/show_bug.cgi?id=71338

Version-Release number of selected component (if applicable):
  libX11 libX11-1.5.0


How reproducible:

Steps to Reproduce:
1. Down testcase of https://bugs.freedesktop.org/attachment.cgi?id=88996 
2. Compile the testcase
3. Run it

Actual results:
  ERROR Received a X IO error on display=8073008.
  backtrace() returned 10 addresses
  ./xdrawnew() [0x8048813]
  ./xdrawnew() [0x80488b4]
   /usr/lib/libX11.so.6(_XIOError+0x57) [0xf76217b7]
   /usr/lib/libX11.so.6(_XReply+0x3d5) [0xf7620075]
   /usr/lib/libX11.so.6(+0x3c36f) [0xf762236f]
   /usr/lib/libX11.so.6(+0x3c485) [0xf7622485]
   /usr/lib/libX11.so.6(XNoOp+0x5a) [0xf760ddca]
   /xdrawnew() [0x8048ac5]
   /lib/libc.so.6(__libc_start_main+0xe6) [0xf7465d26]
  ./xdrawnew() [0x8048761]

Expected results:
  No Error 

Additional info:
  - bug: .......... https://bugs.freedesktop.org/show_bug.cgi?id=71338
  - testcase: ..... https://bugs.freedesktop.org/attachment.cgi?id=88996
  - proposed patch: https://bugs.freedesktop.org/attachment.cgi?id=89001
  - discussion: http://lists.x.org/archives/xorg-devel/2013-October/038370.html.

Comment 1 Luan Jianhai 2013-12-25 07:04:23 UTC
Created attachment 841403 [details]
Patch of fixing libX11 uint_64

Comment 2 Luan Jianhai 2013-12-25 07:08:34 UTC
  As xorg-devel discussion, the attachment should have fix the issue. Would you like merge the patch into latest RHEL?

Comment 5 Luan Jianhai 2014-06-17 08:08:59 UTC
Would you like to give me some respond about the issue. If the patch don't fix the isssue, do you have good advice about the issue.

Comment 11 dave.kinsell 2015-03-02 21:56:04 UTC
Defect still present in RHEL6u6 and RHEL7u0 (libxcb 1.9-5).  Also occurs on apps compiled as 32 bits on 64 bit systems.  Can see the failure in less than 5 minutes of run time, under 'best case' conditions.

Comment 13 Olivier Fourdan 2015-03-26 15:34:16 UTC
(In reply to dave.kinsell from comment #11)
> Defect still present in RHEL6u6 and RHEL7u0 (libxcb 1.9-5).  Also occurs on
> apps compiled as 32 bits on 64 bit systems.  Can see the failure in less
> than 5 minutes of run time, under 'best case' conditions.

A patch to address this issue is currently under review upstream.

Note that X IO errors can have multiple causes, including bugs in the program itself. Reaching the failure in less than 5 minutes sounds surprising, it means you reach the 32bit sequence number limit in less than 5 minutes. To give you an idea, it takes me roughly 5 hours to reach that limit in a VM using the reproducer program, which draws a line continuously.

Comment 14 dave.kinsell 2015-03-26 16:36:48 UTC
Thank you Olivier, so nice to hear this may get a patch from upstream.

The 5 minutes to failure is done with rapid XNoOp() calls, as discussed in https://bugs.freedesktop.org/show_bug.cgi?id=71338

I used my own counter to make sure it was making 2^32 calls before failing.  With a realistic program that we support, it can fail in about 28 hours.

I wanted to clarify this happens with any 32 bit executable, not just on 32 bit systems, because the number of people affected is much larger.

Comment 15 Olivier Fourdan 2015-03-27 10:21:18 UTC
(In reply to dave.kinsell from comment #14)
> The 5 minutes to failure is done with rapid XNoOp() calls, as discussed in
> https://bugs.freedesktop.org/show_bug.cgi?id=71338
> 
> I used my own counter to make sure it was making 2^32 calls before failing. 
> With a realistic program that we support, it can fail in about 28 hours.

OK, thanks for clarifying.
 
> I wanted to clarify this happens with any 32 bit executable, not just on 32
> bit systems, because the number of people affected is much larger.

Yes, correct, 32apps link to 32bit libs and therefore are equally affected, even on a 64bit system - This is why I also cloned these bugs for el7 as well.

Comment 20 errata-xmlrpc 2015-07-22 07:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1358.html


Note You need to log in before you can comment on or make changes to this bug.