Bug 911609 - [abrt] libvirt-client-0.10.2-18.el6: remoteClientCloseFunc: Process /usr/bin/virsh was killed by signal 11 (SIGSEGV)
[abrt] libvirt-client-0.10.2-18.el6: remoteClientCloseFunc: Process /usr/bin/...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
x86_64 Unspecified
high Severity high
: rc
: ---
Assigned To: Peter Krempa
Virtualization Bugs
abrt_hash:25a80aa5b774a648781787aa1f2...
: ZStream
Depends On:
Blocks: 950599
  Show dependency treegraph
 
Reported: 2013-02-15 08:04 EST by David Jaša
Modified: 2013-11-21 03:45 EST (History)
11 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-19.el6
Doc Type: Bug Fix
Doc Text:
Due to a race condition in the libvirt client library, any application using libvirt could terminate unexpectedly with a segmentation fault. This happened when one thread executed the connection close callback, while another one freed the connection object, and the connection callback thread then accessed memory that had been already freed. This update fixes the possibility of freeing the callback data when they are still being accessed.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 03:45:15 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
File: maps (20.56 KB, text/plain)
2013-02-15 08:04 EST, David Jaša
no flags Details
File: var_log_messages (296 bytes, text/plain)
2013-02-15 08:04 EST, David Jaša
no flags Details
File: open_fds (328 bytes, text/plain)
2013-02-15 08:04 EST, David Jaša
no flags Details
File: environ (1.93 KB, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details
File: dso_list (4.71 KB, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details
File: sosreport.tar.xz (1.71 MB, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details
File: backtrace (13.06 KB, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details
File: build_ids (2.16 KB, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details
File: limits (1.29 KB, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details
File: cgroup (88 bytes, text/plain)
2013-02-15 08:05 EST, David Jaša
no flags Details

  None (edit)
Description David Jaša 2013-02-15 08:04:49 EST
Description of problem:
I didn't notice any apparent reason, virsh just crashed after it printed output of 'virsh list --all' command. I couldn't reproduce the crash afterwards.

Version-Release number of selected component:
libvirt-client-0.10.2-18.el6

Additional info:
libreport version: 2.0.9
abrt_version:   2.0.8
backtrace_rating: 4
cmdline:        virsh list --all
crash_function: remoteClientCloseFunc
kernel:         2.6.32-356.el6.x86_64

truncated backtrace:
:Thread no. 1 (8 frames)
: #0 remoteClientCloseFunc at remote/remote_driver.c
: #1 virNetClientCloseLocked at rpc/virnetclient.c
: #2 virNetClientIncomingEvent at rpc/virnetclient.c
: #3 virEventPollDispatchHandles at util/event_poll.c
: #4 virEventPollRunOnce at util/event_poll.c
: #5 virEventRunDefaultImpl at util/event.c
: #6 vshEventLoop at virsh.c
: #7 virThreadHelper at util/threads-pthread.c
Comment 1 David Jaša 2013-02-15 08:04:53 EST
Created attachment 697773 [details]
File: maps
Comment 2 David Jaša 2013-02-15 08:04:56 EST
Created attachment 697774 [details]
File: var_log_messages
Comment 3 David Jaša 2013-02-15 08:04:58 EST
Created attachment 697775 [details]
File: open_fds
Comment 4 David Jaša 2013-02-15 08:05:00 EST
Created attachment 697776 [details]
File: environ
Comment 5 David Jaša 2013-02-15 08:05:03 EST
Created attachment 697777 [details]
File: dso_list
Comment 6 David Jaša 2013-02-15 08:05:12 EST
Created attachment 697778 [details]
File: sosreport.tar.xz
Comment 7 David Jaša 2013-02-15 08:05:15 EST
Created attachment 697779 [details]
File: backtrace
Comment 8 David Jaša 2013-02-15 08:05:18 EST
Created attachment 697780 [details]
File: build_ids
Comment 9 David Jaša 2013-02-15 08:05:20 EST
Created attachment 697781 [details]
File: limits
Comment 10 David Jaša 2013-02-15 08:05:22 EST
Created attachment 697782 [details]
File: cgroup
Comment 12 RHEL Product and Program Management 2013-02-21 01:47:15 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 14 zhenfeng wang 2013-03-11 03:10:19 EDT
I try it on rhel6.4,however i can't reproduce it while I test it with the following steps,can you give me some advise ? or what other operation can i do during the reproduce ? thanks 

Version-Release number of selected component:
libvirt-client-0.10.2-18.el6
kernel-2.6.32-356.el6.x86_64

steps
1 prepare a guests  
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhelnew1                       shut off

2  do the start/destroy operation for the guest,for many times,then check the guest status
# for i in {1..20};do virsh start rhelnew1;sleep 3;virsh list --all; virsh destroy rhelnew1;virsh list --all;done

3 connect the guest in the remote host,then do some operation with the guest

# virsh -c qemu+ssh://$hostIP/system

Type:  'help' for help with commands
       'quit' to quit

virsh # list --all
 Id    Name                           State
----------------------------------------------------
 -     rhelnew1                       shut off
virsh #start rhelnew1

virsh # list --all
 Id    Name                           State
----------------------------------------------------
 25    rhelnew1                       running
Comment 15 Eric Blake 2013-03-27 11:03:28 EDT
https://www.redhat.com/archives/libvir-list/2013-March/msg01517.html might be the right upstream patch for this
Comment 16 Peter Krempa 2013-03-28 04:02:09 EDT
The crash happens due to a race condition of two threads, one of them is executing the connection close callback, while the other one frees the connection object. The connection callback thread then accesses memory that has been already freed. There is only a very slight chance how this might happen (the probability can be improved using wait states in the code).

The fix for this issue is posted upstream as http://www.redhat.com/archives/libvir-list/2013-March/msg01539.html
Comment 17 Peter Krempa 2013-04-05 05:20:06 EDT
Fixed upstream with:

commit 8ad126e695e5cef5da9d62ccfde7338317041e84
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Mar 29 18:21:19 2013 +0100

    rpc: Fix connection close callback race condition and memory corruption/crash
    
    The last Viktor's effort to fix the race and memory corruption unfortunately
    wasn't complete in the case the close callback was not registered in an
    connection. At that time, the trail of event's that I'll describe later could
    still happen and corrupt the memory or cause a crash of the client (including
    the daemon in case of a p2p migration).
    
    Consider the following prerequisities and trail of events:
    Let's have a remote connection to a hypervisor that doesn't have a close
    callback registered and the client is using the event loop. The crash happens in
    cooperation of 2 threads. Thread E is the event loop and thread W is the worker
    that does some stuff. R denotes the remote client.
    
    1.) W - The client finishes everything and sheds the last reference on the client
    2.) W - The virObject stuff invokes virConnectDispose that invokes doRemoteClose
    3.) W - the remote close method invokes the REMOTE_PROC_CLOSE RPC method.
    4.) W - The thread is preempted at this point.
    5.) R - The remote side receives the close and closes the socket.
    6.) E - poll() wakes up due to the closed socket and invokes the close callback
    7.) E - The event loop is preempted right before remoteClientCloseFunc is called
    8.) W - The worker now finishes, and frees the conn object.
    9.) E - The remoteClientCloseFunc accesses the now-freed conn object in the
            attempt to retrieve pointer for the real close callback.
    10.) Kaboom, corrupted memory/segfault.
    
    This patch tries to fix this by introducing a new object that survives the
    freeing of the connection object. We can't increase the reference count on the
    connection object itself or the connection would never be closed, as the
    connection is closed only when the reference count reaches zero.
    
    The new object - virConnectCloseCallbackData - is a lockable object that keeps
    the pointers to the real user registered callback and ensures that the
    connection callback is either not called if the connection was already freed or
    that the connection isn't freed while this is being called.

commit 69ab07560a134e82e36b6391be3c806d3dbdb16c
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Mar 27 14:37:01 2013 +0100

    virsh: Register and unregister the close callback also in cmdConnect
    
    This patch improves the error message after disconnecting from the
    hypervisor and adds the close callback operations required not to leak
    the callback reference.

commit ca9e73ebb60e2efb1ea835e9a394a8b64ecb97c1
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Mar 27 14:22:47 2013 +0100

    virsh: Move cmdConnect from virsh-host.c to virsh.c
    
    The function is used to establish connection so it should be in the main
    virsh file. This movement also enables further improvements done in next
    patches.
    
    Note that the "connect" command has moved from the host section of virsh to the
    main section. It is now listed by 'virsh help virsh' instead of 'virsh help
    host'.

commit e964ba2786f6736613de1f14db4d3407f6928f50
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date:   Tue Mar 26 10:54:55 2013 +0100

    virsh: Unregister the connection close notifier upon termination
    
    Before closing the connection we unregister the close callback
    to prevent a reference leak.
    
    Further, the messages on virConnectClose != 0 are a bit more specific
    now.
    
    Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>

commit 03a43efa86f5099d3f6df334f73961a535e488b5
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date:   Tue Mar 26 10:54:53 2013 +0100

    libvirt: Increase connection reference count for callbacks
    
    By adjusting the reference count of the connection object we
    prevent races between callback function and virConnectClose.
    
    Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>

commit d0cc811ed02d49e60193dfe6601e53adadebb114
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date:   Tue Mar 26 10:54:54 2013 +0100

    remote: Don't call NULL closeFreeCallback
    
    Check function pointer before calling.
    
    Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Comment 18 Peter Krempa 2013-04-05 05:20:40 EDT
git describe: v1.0.4-57-g8ad126e
Comment 19 Peter Krempa 2013-04-05 05:27:29 EDT
This patch - http://www.redhat.com/archives/libvir-list/2013-March/msg01683.html -  when applied on the unfixed source tree makes the crash much more likely to happen.
Comment 20 zhenfeng wang 2013-04-09 23:39:04 EDT
hi Peter:
  I try to reproduce it with virsh cmd before,however I can't reproduce it, so can you give me the steps to reproduce this bug ? so that we can verified it correctly later.thanks
Comment 21 Peter Krempa 2013-04-10 02:17:00 EDT
Hi Zhenfeng,
please see the patch linked in Comment 19. When you apply that patch to the unfixed tree, the race becomes much more reproducible (around 90%). Also that thread contains more information how the bug exposes and what stack traces to expect.
Comment 23 zhenfeng wang 2013-04-11 06:38:22 EDT
Hi Peter:
 I just rebuild the libvirtd packet with the libvirt-rhel.git,however, I found there were too many duplicated and dependence during my rebuilding. I spent the whole afternoon perpareing for it and not get it out yet,so can you spare some time to rebuild a temporary packet for me when you are free,meanwhile, I'll keep on doing it. thanks
Comment 24 yanbing du 2013-04-15 05:19:41 EDT
(In reply to comment #23)
> Hi Peter:
>  I just rebuild the libvirtd packet with the libvirt-rhel.git,however, I
> found there were too many duplicated and dependence during my rebuilding. I
> spent the whole afternoon perpareing for it and not get it out yet,so can
> you spare some time to rebuild a temporary packet for me when you are
> free,meanwhile, I'll keep on doing it. thanks

Hi, zhenfeng
I just rebuild this libvirt rpm packages(applied patch of comment 19, and you can reach me to get it), and reproduce this bug.

In one terminal:
# for i in {1..100}; do virsh list --all & virsh list --all ;done 
In another one:
# virsh list
 Id    Name                           State
----------------------------------------------------



 DEBUG: Connection close called, sleeping



 DEBUG: calling the close callback



 DEBUG: Finishing close

Segmentation fault (core dumped)
Comment 26 zhenfeng wang 2013-07-11 08:48:05 EDT
Hi peter:
I just reproduce this bug with the libvirt-0.10.2-18.el6 which compiled by ydu in comment 24, unfortunately, I can't reproduce this issue according to ydu's comment in my environment, maybe my environment was little different with ydu's. However, i can get another Segmentation fault with the migration. Since there was something wrong with the abrt(909617),so i can't get the coredump file from my machine,so that I'm not sure whether it was also our expect segmentation fault. Here I got some log about it ,can you help me have a look about it by this log messages? thanks

package vesion
kernel-2.6.32-356.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6.x86_64
libvirt-0.10.2-18.el6.x86_64

steps

 virsh migrate --live win qemu+ssh://xx.xx.xx.xx/system --verbose
root@xx.xx.xx.xx's password:


 DEBUG: Connection close called, sleeping



 DEBUG: calling the close callback



 DEBUG: Finishing close

error: cannot open file '/var/lib/libvirt/images/b.iso': No such file or directory



 DEBUG: Connection close called, sleeping


 DEBUG: Finishing close

Segmentation fault (core dumped)

ul 11 20:16:07 zhwang64 kernel: virsh[4796]: segfault at 21 ip 0000000000000021 sp 00007fed37b53b58 error 14 in virsh[400000+58000]
Jul 11 20:16:07 zhwang64 abrtd: Directory 'ccpp-2013-07-11-20:16:07-4795' creation detected
Jul 11 20:16:07 zhwang64 abrt[4801]: Saved core dump of pid 4795 (/usr/bin/virsh) to /var/spool/abrt/ccpp-2013-07-11-20:16:07-4795 (31240192 bytes)
Jul 11 20:16:08 zhwang64 abrtd: Package 'libvirt-client' isn't signed with proper key
Jul 11 20:16:08 zhwang64 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-07-11-20:16:07-4795' exited with 1
Jul 11 20:16:08 zhwang64 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2013-07-11-20:16:07-4795'

# ll /var/spool/abrt
total 4
-rw-------. 1 root root 14 Jul 11 20:16 last-ccpp

cat /var/spool/abrt/last-ccpp 
/usr/bin/virsh
Comment 27 Peter Krempa 2013-07-11 10:08:00 EDT
The problem was in the close callback, thus every client that is using the callback was likely to crash. The patch I attached to the 6.4.z clone of this bug can be used to reproduce this with 100% success rate. 

Your steps above use migration for that purpose which is okay in terms of the original bug, but it should crash virtually for every virsh command that is remote.
Comment 28 zhenfeng wang 2013-07-17 03:12:23 EDT
hi Peter,
Thanks for your replay. Sorry to say that i can still only reproduce this issue with the migration, can't reproduce it with other virsh command evenif i try to reproduce it with many times. And I found that the issue in comment 26 has gone while i updated the libvirt to libvirt-0.10.2-19.el6, so can I mark this bug verified with the comment 26 steps? thanks
Comment 29 zhenfeng wang 2013-08-16 02:15:03 EDT
Since it works well on the libvirt-0.10.2-19.el6 which was with the reproducer patch applied , also have confirmed it with peter on irc, so mark this bug verified
Comment 31 errata-xmlrpc 2013-11-21 03:45:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html

Note You need to log in before you can comment on or make changes to this bug.