This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 240012 - xen-vncfb segfault
xen-vncfb segfault
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: xen (Show other bugs)
rawhide
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Markus Armbruster
:
: 240424 (view as bug list)
Depends On:
Blocks: 245188
  Show dependency treegraph
 
Reported: 2007-05-14 07:52 EDT by Richard W.M. Jones
Modified: 2007-11-30 17:12 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-09-24 19:29:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Core dump (26.50 KB, application/x-bzip2)
2007-05-16 11:21 EDT, Richard W.M. Jones
no flags Details
Binary of xen-vncfb matching preceeding coredump. (88.32 KB, application/x-bzip2)
2007-05-16 11:22 EDT, Richard W.M. Jones
no flags Details
Coredump with symbols (29.25 KB, application/x-bzip2)
2007-05-16 12:52 EDT, Richard W.M. Jones
no flags Details
Binary file matching the above coredump with symbols (80.54 KB, application/x-bzip2)
2007-05-16 12:52 EDT, Richard W.M. Jones
no flags Details
Binary file matching the above coredump with symbols (2nd version) (161.93 KB, application/x-bzip2)
2007-05-17 05:31 EDT, Richard W.M. Jones
no flags Details
Another core dump (29.13 KB, application/x-bzip2)
2007-05-17 05:57 EDT, Richard W.M. Jones
no flags Details
Another core dump (27.73 KB, application/x-bzip2)
2007-05-17 06:35 EDT, Richard W.M. Jones
no flags Details
Another core dump (29.14 KB, application/x-bzip2)
2007-05-17 07:01 EDT, Richard W.M. Jones
no flags Details
Reference counting fix (807 bytes, patch)
2007-06-08 15:35 EDT, Markus Armbruster
no flags Details | Diff
Don't continue blindly when the socket is closed (885 bytes, patch)
2007-06-08 15:46 EDT, Markus Armbruster
no flags Details | Diff
Avoid double cleanup (1.02 KB, patch)
2007-07-27 10:15 EDT, Markus Armbruster
no flags Details | Diff
Fix rfbClientIterator (2.52 KB, patch)
2007-07-31 11:06 EDT, Markus Armbruster
no flags Details | Diff
Avoid double-cleanup (1.04 KB, patch)
2007-07-31 11:08 EDT, Markus Armbruster
no flags Details | Diff

  None (edit)
Description Richard W.M. Jones 2007-05-14 07:52:37 EDT
Description of problem:

The following lines were found in dmesg after I had been aggressively starting
and stopping guests and installing new guests for a period of about 1 hour
(using virt-manager).

# dmesg|grep segfault
xen-vncfb[14705]: segfault at 0000000000000000 rip 000000000040c7f4 rsp
0000000042802c48 error 6
xen-vncfb[13053]: segfault at 0000000000000000 rip 000000000040c7f4 rsp
0000000042802ca8 error 6

There seemed to be no visible indication from virt-manager, _except_ that
sometimes when you hit the Install->Finish button in virt-manager, the console
did not appear even though the guest has started installing.  This might be
related to the appearance of the above, but I am not sure (further testing ongoing).

Version-Release number of selected component (if applicable):

# rpm -qf /usr/lib64/xen/bin/xen-vncfb
xen-3.1.0-0.rc7.1.fc7
# uname -a
Linux lambda 2.6.20-2925.8.fc7xen #1 SMP Thu May 10 17:47:43 EDT 2007 x86_64
x86_64 x86_64 GNU/Linux

Other parts of Fedora 7 up to date as of this morning.

How reproducible:

Further testing of this is ongoing to see if it is related to the console not
coming up when starting to install a guest.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Richard W.M. Jones 2007-05-14 08:36:31 EDT
This segfault is _not_ coincident with the "lost console" when starting an
install from virt-manager.
Comment 2 Markus Armbruster 2007-05-14 18:19:16 EDT
rip 000000000040c7f4 seems to be sraSpanInsertBefore+4

  40c7f0:       48 8b 46 08             mov    0x8(%rsi),%rax
  40c7f4:       48 89 37                mov    %rsi,(%rdi)
  40c7f7:       48 89 47 08             mov    %rax,0x8(%rdi)
  40c7fb:       48 8b 46 08             mov    0x8(%rsi),%rax
  40c7ff:       48 89 7e 08             mov    %rdi,0x8(%rsi)
  40c803:       48 89 38                mov    %rdi,(%rax)
  40c806:       c3                      retq   

static void
sraSpanInsertBefore(sraSpan *newspan, sraSpan *before) {
  newspan->_next = before;
  newspan->_prev = before->_prev;
  before->_prev->_next = newspan;
  before->_prev = newspan;
}

newspan must have been null.  Before I investigate all possible callers, let's
try to capture a core dump.  Richard, could you take care of that?
Comment 3 Richard W.M. Jones 2007-05-16 11:21:21 EDT
Created attachment 154832 [details]
Core dump

This is the core dump.	I'm going to upload the corresponding binary next - it
is slightly different because I have recompiled xen on this machine (to apply
another patch).
Comment 4 Richard W.M. Jones 2007-05-16 11:22:07 EDT
Created attachment 154833 [details]
Binary of xen-vncfb matching preceeding coredump.

This is the binary which matches the preceeding coredump.
Comment 5 Richard W.M. Jones 2007-05-16 12:52:10 EDT
Created attachment 154848 [details]
Coredump with symbols
Comment 6 Richard W.M. Jones 2007-05-16 12:52:47 EDT
Created attachment 154849 [details]
Binary file matching the above coredump with symbols
Comment 7 Markus Armbruster 2007-05-16 15:39:55 EDT
I'm sorry, but the binary in attachment 154849 [details] is stripped.  Can I have one with
symbols?
Comment 8 Richard W.M. Jones 2007-05-17 05:31:11 EDT
Created attachment 154900 [details]
Binary file matching the above coredump with symbols (2nd version)

Sorry about that.  Try this one.
Comment 9 Richard W.M. Jones 2007-05-17 05:57:11 EDT
Created attachment 154903 [details]
Another core dump

This core dump has a slightly different stack trace, although it's in the same
general area of the code.
Comment 10 Richard W.M. Jones 2007-05-17 06:35:34 EDT
Created attachment 154904 [details]
Another core dump

And another core dump, with yet again a slightly different stack trace,
although the same area of the code.
Comment 11 Richard W.M. Jones 2007-05-17 07:01:44 EDT
Created attachment 154905 [details]
Another core dump

And another one - again with a different stack trace from the others.
Comment 12 Mark McLoughlin 2007-05-17 10:38:40 EDT
Suggest something like this untested patch:

  http://www.gnome.org/~markmc/code/xen-libvncserver-threading.patch
Comment 13 Markus Armbruster 2007-05-17 14:13:23 EDT
*** Bug 240424 has been marked as a duplicate of this bug. ***
Comment 14 Richard W.M. Jones 2007-05-18 08:56:53 EDT
Markus suggests running xen-vncfb under valgrind to gain more information.
Comment 15 Markus Armbruster 2007-06-08 15:35:14 EDT
Created attachment 156605 [details]
Reference counting fix

Patch proposed by Mark McLoughlin.  It makes the refcounting match the life
range of cl.
Comment 16 Markus Armbruster 2007-06-08 15:46:52 EDT
Created attachment 156608 [details]
Don't continue blindly when the socket is closed

Patch proposed by Mark McLoughlin.  Avoids passing invalid file descriptor to
FD_SET() etc.
Comment 17 Markus Armbruster 2007-06-08 15:49:56 EDT
This is actually a whole family of bugs, and the patches I just attached fix
just two of them.  There may be more.

The root cause they share is that when the client goes away, the threads doing
work for it terminate in a confused and badly coordinated manner, which only
works with lucky timing. 
Comment 18 Markus Armbruster 2007-07-27 10:15:17 EDT
Created attachment 160117 [details]
Avoid double cleanup

The client iterator (protected by rfbClientListMutex) skips entries
with sock<0.  But rfbClientConnectionGone() neglects to reset
cl->sock.  This leads to double-cleanup, with disastrous results.
Comment 19 Markus Armbruster 2007-07-31 11:06:26 EDT
Created attachment 160331 [details]
Fix rfbClientIterator

rfbClientIterator is swarming with bugs:
* Whoever added rfbClientListMutex didn't know what he was doing.
* Reference counting is broken
* The iterator normally skips closed entries (those with sock<0).  But
  rfbClientConnectionGone() neglects to reset cl->sock.
* Closed entries are *not* skipped when LIBVNCSERVER_HAVE_LIBPTHREAD
  is undefined.
Comment 20 Markus Armbruster 2007-07-31 11:08:39 EDT
Created attachment 160332 [details]
Avoid double-cleanup

Both clientInput() and rfbScreenCleanup() call
rfbClientConnectionGone().  This works only if clientInput() wins the
race with a sufficient margin to take the client off the list before
rfbScreenCleanup() sees it.  Otherwise, rfbClientConnectionGone() is
called twice, with potentially disastrous results.
Comment 21 Daniel Berrange 2007-09-24 19:29:30 EDT
Rawhide no longer uses the LibVNCServer codebase at all, so closing this bug.
F-7 has had an errata pushed to fix the races

* Sun Sep 23 2007 Daniel P. Berrange <berrange@redhat.com> - 3.1.0-3.fc7
- Fix race conditions in LibVNCServer on client disconnect

Note You need to log in before you can comment on or make changes to this bug.