Bug 193238

Summary:	"xterm -e echo test" on Xvfb sometimes hangs
Product:	Red Hat Enterprise Linux 4	Reporter:	Göran Uddeborg <goeran>
Component:	xterm	Assignee:	Miroslav Lichvar <mlichvar>
Status:	CLOSED ERRATA	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	dkovalsk, tao
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:	RHBA-2007-0040	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-02-08 19:59:23 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	198694

Description Göran Uddeborg 2006-05-26 14:40:13 UTC

Description of problem:
In some scripts we run commands like "xterm -display ... -e echo test" to see if
an Xvfb server is active.  This should either succeed or fail pretty quickly. 
Occasionally it hangs instead.

Version-Release number of selected component (if applicable):
xorg-x11-Xvfb-6.8.2-1.EL.13.25.1
xterm-192-1

How reproducible:
Most of the times the command works as expected.  But if done in a loop, it is
pretty easy to get it to within a few thousand tries.

Steps to Reproduce:
1. On one host start Xvfb:

Xvfb -ac :99

2. On a different host, run this loop:

for i in `seq 1 10000`
do  if [[ $i = *00 ]]
    then echo $i
    fi
    xterm -display david:99 -e echo test
done

Actual results:
After a while there will be no more numbers echoed.  An xterm will be hanging

Additional info:
I have only been able to reproduce this with xterm and Xvfb running on different
hosts.

I have not been able to reproduce the problem on real displays, only Xvfb ones.
 for that reason I chose xorg-x11(-Xvfb) as the component.  But it might as well
be a bug in xterm.

Some basic debugging of the hanging xterm doesn't tell me that much:

fougamou> pstack 8511
#0  0x004ba7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x005a769e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
#2  0x005398af in _L_mutex_lock_40 () from /lib/tls/libc.so.6
#3  0x00000000 in ?? ()
fougamou> strace -p 8511
Process 8511 attached - interrupt to quit
futex(0x5fa820, FUTEX_WAIT, 2, NULL

Comment 1 Mike A. Harris 2006-05-29 16:14:24 UTC

If xterm is hanging, then I would assume the bug is in xterm itself.

Reassigning...

Comment 3 Jason Vas Dias 2006-05-30 20:46:40 UTC

Yes, I too have been able to reproduce this problem, by running 
# Xvfb -ac :0
on RHEL-4-U3 i386 uniprocessor 'hostA', with no other X server running, 
and then on x86_64 multiprocessor RHEL-4-U3 'hostB', running this command:
# for((i=0;i<10000;i++)) do echo -n $i; xterm -display hostA:0 -e echo 'hello
from hostB'; tput hpa 0; echo -n ${i//[0-9]/_}; tput hpa 0; done;

Eventually, the numbers emitted by the command above stop changing (an xterm
process has hung).

Generating a core dump of the hung xterm process shows it is in 
'nonblocking_wait()' :
#0  0x0000003e045d16ab in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6
#1  0x0000003e04730620 in __malloc_initialize_hook () from /lib64/tls/libc.so.6
#2  0x000000000041c140 in nonblocking_wait () at ./main.c:4424
#3  0x0000003e0456d2c1 in posix_memalign () from /lib64/tls/libc.so.6
#4  0x0000007fbffff6f0 in ?? ()
#5  0x0000000000661380 in ?? ()
#6  0x0000003e0a43cb30 in XtSetWMColormapWindows () from /usr/X11R6/lib64/libXt.so.6
#7  0x000000000041c140 in nonblocking_wait () at ./main.c:4424
#8  0x0000000014000001 in ?? ()
#9  0x0000007fbfffeb80 in ?? ()
#10 0x0000000000000050 in ?? ()
#11 0x0000000000000031 in ?? ()
#12 0x0000003e14000000 in ?? ()
#13 <signal handler called>

It appears that the nonblocking_wait() function may potentially be re-entered
because the 'reapchild()' SIGCHLD signal handler called Cleanup(0) directly;
this has since been fixed in later xterm versions, such that reapchild() 
only sets an integer which is then checked and Cleanup() called outside of the
signal handler.

This fix has been backported with xterm-192-6.EL4, available from:
    http://people.redhat.com/~jvdias/xterm/RHEL-4

With this version, the 10000 times loop completes OK. 
Please try out this version and let me know of any issues, or please confirm
that it fixes the problem for you also - then I'll release it on the RHEL-4
Fastrack channel - thanks!

Comment 4 Göran Uddeborg 2006-05-31 08:10:13 UTC

I've tried it now, and I can confirm that it fixes the problem.

Comment 6 RHEL Program Management 2006-10-09 22:06:20 UTC

The component this request has been filed against is not planned for inclusion
in the next update. The decision is based on weighting the priority and number
of requests for a component as well as the impact on the Red Hat Enterprise
Linux user-base: other components are considered having higher priority and the
number of changes we intend to include in update cycles is limited.

Comment 7 RHEL Program Management 2006-10-09 22:15:55 UTC

Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request.

Comment 8 Göran Uddeborg 2006-10-11 08:40:10 UTC

I'm slightly surprised, since you have had a working fix available since four
months (comment 3).  It's there, why not use it?

Still, we have found ways to work around this bug, so I won't appeal.

Comment 16 Red Hat Bugzilla 2007-02-08 19:59:23 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0040.html