Bug 240424

Summary: xen-vncfb segfault in clientInput calling FD_SET(-1, &rfds)
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: xenAssignee: Markus Armbruster <armbru>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: katzj, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-17 14:13:19 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Description Flags
Core dump
Core dump
Binary file matching the above coredump with symbols none

Description Richard W.M. Jones 2007-05-17 10:05:24 EDT
Description of problem:

xen-vncfb segfaults under very heavy load.

Core was generated by `/usr/lib64/xen/bin/xen-vncfb --unused --listen
-k en-us --domid 514 -'.
Program terminated with signal 7, Bus error.
#0  0x0000000000407111 in clientInput (data=<value optimized out>)
    at main.c:504
504             FD_SET(sock, &rfds);
(gdb) bt
#0  0x0000000000407111 in clientInput (data=<value optimized out>)
    at main.c:504
#1  0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0
#2  0x00000030304d043d in clone () from /lib64/libc.so.6
(gdb) print sock
$1 = -1
(gdb) print &rfds
$2 = (fd_set *) 0x41e01c30

Note that sock is -1, hence the problem.

At the same time that this happened, the kernel printed:
xen-vncfb[5823] trap stack segment rip:407111 rsp:41e01b30 error:0

Version-Release number of selected component (if applicable):

xen-3.1.0-0.rc7.1.fc7 + patch to fix bug 240009

How reproducible:

This bug appears intermittently (it is very rare).

Steps to Reproduce:
1. Run Xen stress tests with 8 guests
Actual results:

xen-vncfb segfaults.

Expected results:

xen-vncfb should not crash.

Additional info:

A xen-vncfb binary which matches this core dump can be found in bug 240012.
Comment 1 Richard W.M. Jones 2007-05-17 10:05:25 EDT
Created attachment 154918 [details]
Core dump
Comment 2 Mark McLoughlin 2007-05-17 10:38:21 EDT
Suggest something like this untested patch:

Comment 3 Richard W.M. Jones 2007-05-17 12:32:29 EDT
Created attachment 154928 [details]
Core dump

I applied http://www.gnome.org/~markmc/code/xen-libvncserver-threading.patch
and http://www.gnome.org/~markmc/code/xen-libvncserver-threading.patch and I
get this slightly different core dump (binary to follow).
Comment 4 Richard W.M. Jones 2007-05-17 12:33:28 EDT
Created attachment 154929 [details]
Binary file matching the above coredump with symbols

This binary is compiled with -O0 -g so it is a lot easier to follow what's
going on.
Comment 5 Markus Armbruster 2007-05-17 14:13:19 EDT
The root cause of the original crash and the crash unmasked by the fix in
comment #2 seems to be the same as the cause of bug 240012: when the client goes
away, the threads doing work for it terminate in a confused and badly
coordinated manner, which only works with lucky timing.  Let's use bug 240012 to
track.  Marking this one as duplicate.

*** This bug has been marked as a duplicate of 240012 ***