Bug 245188 - xen-vncfb segfault
xen-vncfb segfault
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.0
All Linux
high Severity medium
: ---
: ---
Assigned To: Markus Armbruster
:
Depends On: 240012
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-21 11:12 EDT by Markus Armbruster
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version: RHEA-2007-0635
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 12:10:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Markus Armbruster 2007-06-21 11:12:49 EDT
+++ This bug was initially created as a clone of Bug #240012 +++

Description of problem:

The following lines were found in dmesg after I had been aggressively starting
and stopping guests and installing new guests for a period of about 1 hour
(using virt-manager).

# dmesg|grep segfault
xen-vncfb[14705]: segfault at 0000000000000000 rip 000000000040c7f4 rsp
0000000042802c48 error 6
xen-vncfb[13053]: segfault at 0000000000000000 rip 000000000040c7f4 rsp
0000000042802ca8 error 6

There seemed to be no visible indication from virt-manager, _except_ that
sometimes when you hit the Install->Finish button in virt-manager, the console
did not appear even though the guest has started installing.  This might be
related to the appearance of the above, but I am not sure (further testing ongoing).

Version-Release number of selected component (if applicable):

# rpm -qf /usr/lib64/xen/bin/xen-vncfb
xen-3.1.0-0.rc7.1.fc7
# uname -a
Linux lambda 2.6.20-2925.8.fc7xen #1 SMP Thu May 10 17:47:43 EDT 2007 x86_64
x86_64 x86_64 GNU/Linux

Other parts of Fedora 7 up to date as of this morning.

How reproducible:

Further testing of this is ongoing to see if it is related to the console not
coming up when starting to install a guest.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

-- Additional comment from rjones@redhat.com on 2007-05-14 08:36 EST --
This segfault is _not_ coincident with the "lost console" when starting an
install from virt-manager.

-- Additional comment from armbru@redhat.com on 2007-05-14 18:19 EST --
rip 000000000040c7f4 seems to be sraSpanInsertBefore+4

  40c7f0:       48 8b 46 08             mov    0x8(%rsi),%rax
  40c7f4:       48 89 37                mov    %rsi,(%rdi)
  40c7f7:       48 89 47 08             mov    %rax,0x8(%rdi)
  40c7fb:       48 8b 46 08             mov    0x8(%rsi),%rax
  40c7ff:       48 89 7e 08             mov    %rdi,0x8(%rsi)
  40c803:       48 89 38                mov    %rdi,(%rax)
  40c806:       c3                      retq   

static void
sraSpanInsertBefore(sraSpan *newspan, sraSpan *before) {
  newspan->_next = before;
  newspan->_prev = before->_prev;
  before->_prev->_next = newspan;
  before->_prev = newspan;
}

newspan must have been null.  Before I investigate all possible callers, let's
try to capture a core dump.  Richard, could you take care of that?

-- Additional comment from rjones@redhat.com on 2007-05-16 11:21 EST --
Created an attachment (id=154832)
Core dump

This is the core dump.	I'm going to upload the corresponding binary next - it
is slightly different because I have recompiled xen on this machine (to apply
another patch).

-- Additional comment from rjones@redhat.com on 2007-05-16 11:22 EST --
Created an attachment (id=154833)
Binary of xen-vncfb matching preceeding coredump.

This is the binary which matches the preceeding coredump.

-- Additional comment from rjones@redhat.com on 2007-05-16 12:52 EST --
Created an attachment (id=154848)
Coredump with symbols


-- Additional comment from rjones@redhat.com on 2007-05-16 12:52 EST --
Created an attachment (id=154849)
Binary file matching the above coredump with symbols


-- Additional comment from armbru@redhat.com on 2007-05-16 15:39 EST --
I'm sorry, but the binary in attachment 154849 [details] is stripped.  Can I have one with
symbols?

-- Additional comment from rjones@redhat.com on 2007-05-17 05:31 EST --
Created an attachment (id=154900)
Binary file matching the above coredump with symbols (2nd version)

Sorry about that.  Try this one.

-- Additional comment from rjones@redhat.com on 2007-05-17 05:57 EST --
Created an attachment (id=154903)
Another core dump

This core dump has a slightly different stack trace, although it's in the same
general area of the code.

-- Additional comment from rjones@redhat.com on 2007-05-17 06:35 EST --
Created an attachment (id=154904)
Another core dump

And another core dump, with yet again a slightly different stack trace,
although the same area of the code.

-- Additional comment from rjones@redhat.com on 2007-05-17 07:01 EST --
Created an attachment (id=154905)
Another core dump

And another one - again with a different stack trace from the others.

-- Additional comment from markmc@redhat.com on 2007-05-17 10:38 EST --
Suggest something like this untested patch:

  http://www.gnome.org/~markmc/code/xen-libvncserver-threading.patch

-- Additional comment from armbru@redhat.com on 2007-05-17 14:13 EST --
*** Bug 240424 has been marked as a duplicate of this bug. ***

-- Additional comment from rjones@redhat.com on 2007-05-18 08:56 EST --
Markus suggests running xen-vncfb under valgrind to gain more information.

-- Additional comment from armbru@redhat.com on 2007-06-08 15:35 EST --
Created an attachment (id=156605)
Reference counting fix

Patch proposed by Mark McLoughlin.  It makes the refcounting match the life
range of cl.


-- Additional comment from armbru@redhat.com on 2007-06-08 15:46 EST --
Created an attachment (id=156608)
Don't continue blindly when the socket is closed

Patch proposed by Mark McLoughlin.  Avoids passing invalid file descriptor to
FD_SET() etc.


-- Additional comment from armbru@redhat.com on 2007-06-08 15:49 EST --
This is actually a whole family of bugs, and the patches I just attached fix
just two of them.  There may be more.

The root cause they share is that when the client goes away, the threads doing
work for it terminate in a confused and badly coordinated manner, which only
works with lucky timing.
Comment 1 RHEL Product and Program Management 2007-06-21 11:23:53 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Daniel Berrange 2007-08-02 16:56:54 EDT
$ brew latest-pkg dist-5E-qu-candidate xen
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
xen-3.0.3-35.el5                          dist-5E-qu-candidate  berrange


* Wed Aug  1 2007 Daniel P. Berrange <berrange@redhat.com> - 3.0.3-35.el5
- Fix more VNC threading problems (rhbz #245188)
Comment 7 errata-xmlrpc 2007-11-07 12:10:49 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0635.html

Note You need to log in before you can comment on or make changes to this bug.