Description of problem: In some scripts we run commands like "xterm -display ... -e echo test" to see if an Xvfb server is active. This should either succeed or fail pretty quickly. Occasionally it hangs instead. Version-Release number of selected component (if applicable): xorg-x11-Xvfb-6.8.2-1.EL.13.25.1 xterm-192-1 How reproducible: Most of the times the command works as expected. But if done in a loop, it is pretty easy to get it to within a few thousand tries. Steps to Reproduce: 1. On one host start Xvfb: Xvfb -ac :99 2. On a different host, run this loop: for i in `seq 1 10000` do if [[ $i = *00 ]] then echo $i fi xterm -display david:99 -e echo test done Actual results: After a while there will be no more numbers echoed. An xterm will be hanging Additional info: I have only been able to reproduce this with xterm and Xvfb running on different hosts. I have not been able to reproduce the problem on real displays, only Xvfb ones. for that reason I chose xorg-x11(-Xvfb) as the component. But it might as well be a bug in xterm. Some basic debugging of the hanging xterm doesn't tell me that much: fougamou> pstack 8511 #0 0x004ba7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x005a769e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6 #2 0x005398af in _L_mutex_lock_40 () from /lib/tls/libc.so.6 #3 0x00000000 in ?? () fougamou> strace -p 8511 Process 8511 attached - interrupt to quit futex(0x5fa820, FUTEX_WAIT, 2, NULL
If xterm is hanging, then I would assume the bug is in xterm itself. Reassigning...
Yes, I too have been able to reproduce this problem, by running # Xvfb -ac :0 on RHEL-4-U3 i386 uniprocessor 'hostA', with no other X server running, and then on x86_64 multiprocessor RHEL-4-U3 'hostB', running this command: # for((i=0;i<10000;i++)) do echo -n $i; xterm -display hostA:0 -e echo 'hello from hostB'; tput hpa 0; echo -n ${i//[0-9]/_}; tput hpa 0; done; Eventually, the numbers emitted by the command above stop changing (an xterm process has hung). Generating a core dump of the hung xterm process shows it is in 'nonblocking_wait()' : #0 0x0000003e045d16ab in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6 #1 0x0000003e04730620 in __malloc_initialize_hook () from /lib64/tls/libc.so.6 #2 0x000000000041c140 in nonblocking_wait () at ./main.c:4424 #3 0x0000003e0456d2c1 in posix_memalign () from /lib64/tls/libc.so.6 #4 0x0000007fbffff6f0 in ?? () #5 0x0000000000661380 in ?? () #6 0x0000003e0a43cb30 in XtSetWMColormapWindows () from /usr/X11R6/lib64/libXt.so.6 #7 0x000000000041c140 in nonblocking_wait () at ./main.c:4424 #8 0x0000000014000001 in ?? () #9 0x0000007fbfffeb80 in ?? () #10 0x0000000000000050 in ?? () #11 0x0000000000000031 in ?? () #12 0x0000003e14000000 in ?? () #13 <signal handler called> It appears that the nonblocking_wait() function may potentially be re-entered because the 'reapchild()' SIGCHLD signal handler called Cleanup(0) directly; this has since been fixed in later xterm versions, such that reapchild() only sets an integer which is then checked and Cleanup() called outside of the signal handler. This fix has been backported with xterm-192-6.EL4, available from: http://people.redhat.com/~jvdias/xterm/RHEL-4 With this version, the 10000 times loop completes OK. Please try out this version and let me know of any issues, or please confirm that it fixes the problem for you also - then I'll release it on the RHEL-4 Fastrack channel - thanks!
I've tried it now, and I can confirm that it fixes the problem.
The component this request has been filed against is not planned for inclusion in the next update. The decision is based on weighting the priority and number of requests for a component as well as the impact on the Red Hat Enterprise Linux user-base: other components are considered having higher priority and the number of changes we intend to include in update cycles is limited.
Product Management has reviewed and declined this request. You may appeal this decision by reopening this request.
I'm slightly surprised, since you have had a working fix available since four months (comment 3). It's there, why not use it? Still, we have found ways to work around this bug, so I won't appeal.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0040.html