Bug 240012
Description
Richard W.M. Jones
2007-05-14 11:52:37 UTC
This segfault is _not_ coincident with the "lost console" when starting an install from virt-manager. rip 000000000040c7f4 seems to be sraSpanInsertBefore+4 40c7f0: 48 8b 46 08 mov 0x8(%rsi),%rax 40c7f4: 48 89 37 mov %rsi,(%rdi) 40c7f7: 48 89 47 08 mov %rax,0x8(%rdi) 40c7fb: 48 8b 46 08 mov 0x8(%rsi),%rax 40c7ff: 48 89 7e 08 mov %rdi,0x8(%rsi) 40c803: 48 89 38 mov %rdi,(%rax) 40c806: c3 retq static void sraSpanInsertBefore(sraSpan *newspan, sraSpan *before) { newspan->_next = before; newspan->_prev = before->_prev; before->_prev->_next = newspan; before->_prev = newspan; } newspan must have been null. Before I investigate all possible callers, let's try to capture a core dump. Richard, could you take care of that? Created attachment 154832 [details]
Core dump
This is the core dump. I'm going to upload the corresponding binary next - it
is slightly different because I have recompiled xen on this machine (to apply
another patch).
Created attachment 154833 [details]
Binary of xen-vncfb matching preceeding coredump.
This is the binary which matches the preceeding coredump.
Created attachment 154848 [details]
Coredump with symbols
Created attachment 154849 [details]
Binary file matching the above coredump with symbols
I'm sorry, but the binary in attachment 154849 [details] is stripped. Can I have one with
symbols?
Created attachment 154900 [details]
Binary file matching the above coredump with symbols (2nd version)
Sorry about that. Try this one.
Created attachment 154903 [details]
Another core dump
This core dump has a slightly different stack trace, although it's in the same
general area of the code.
Created attachment 154904 [details]
Another core dump
And another core dump, with yet again a slightly different stack trace,
although the same area of the code.
Created attachment 154905 [details]
Another core dump
And another one - again with a different stack trace from the others.
Suggest something like this untested patch: http://www.gnome.org/~markmc/code/xen-libvncserver-threading.patch *** Bug 240424 has been marked as a duplicate of this bug. *** Markus suggests running xen-vncfb under valgrind to gain more information. Created attachment 156605 [details]
Reference counting fix
Patch proposed by Mark McLoughlin. It makes the refcounting match the life
range of cl.
Created attachment 156608 [details]
Don't continue blindly when the socket is closed
Patch proposed by Mark McLoughlin. Avoids passing invalid file descriptor to
FD_SET() etc.
This is actually a whole family of bugs, and the patches I just attached fix just two of them. There may be more. The root cause they share is that when the client goes away, the threads doing work for it terminate in a confused and badly coordinated manner, which only works with lucky timing. Created attachment 160117 [details]
Avoid double cleanup
The client iterator (protected by rfbClientListMutex) skips entries
with sock<0. But rfbClientConnectionGone() neglects to reset
cl->sock. This leads to double-cleanup, with disastrous results.
Created attachment 160331 [details]
Fix rfbClientIterator
rfbClientIterator is swarming with bugs:
* Whoever added rfbClientListMutex didn't know what he was doing.
* Reference counting is broken
* The iterator normally skips closed entries (those with sock<0). But
rfbClientConnectionGone() neglects to reset cl->sock.
* Closed entries are *not* skipped when LIBVNCSERVER_HAVE_LIBPTHREAD
is undefined.
Created attachment 160332 [details]
Avoid double-cleanup
Both clientInput() and rfbScreenCleanup() call
rfbClientConnectionGone(). This works only if clientInput() wins the
race with a sufficient margin to take the client off the list before
rfbScreenCleanup() sees it. Otherwise, rfbClientConnectionGone() is
called twice, with potentially disastrous results.
Rawhide no longer uses the LibVNCServer codebase at all, so closing this bug. F-7 has had an errata pushed to fix the races * Sun Sep 23 2007 Daniel P. Berrange <berrange> - 3.1.0-3.fc7 - Fix race conditions in LibVNCServer on client disconnect |