Bug 1171124 - libvirtd occasionally crashes at the end of migration
Summary: libvirtd occasionally crashes at the end of migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1162208
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-05 12:33 UTC by Jan Kurik
Modified: 2015-01-05 20:30 UTC (History)
11 users (show)

Fixed In Version: libvirt-1.1.1-29.el7_0.4
Doc Type: Bug Fix
Doc Text:
Cause: Libvirt did not properly check whether a DAC security label is non-NULL before trying to parse user/group ownership from it. Consequence: When virDomainGetBlockInfo API is called on a transient domain that has just finished migration to another host, its DAC security label may already be NULL, which crashes libvirtd. Since RHEV uses transient domains and periodically calls virDomainGetBlockInfo, it's just a matter of timing if the API is called at the right time to crash libvirtd. Fix: Properly check DAC label before trying to parse it. Result: Libvirtd no longer crashes in the described scenario.
Clone Of: 1162208
Environment:
Last Closed: 2015-01-05 20:30:15 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0008 0 normal SHIPPED_LIVE Low: libvirt security and bug fix update 2015-01-06 01:29:48 UTC

Description Jan Kurik 2014-12-05 12:33:40 UTC
This bug has been copied from bug #1162208 and has been proposed
to be backported to 7.0 z-stream (EUS).

Comment 6 Pei Zhang 2014-12-12 06:40:06 UTC
Reproduce :
version:
libvirt-1.1.1-29.el7_0.3.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.11
kernel-3.10.0-123.17.1.el7.x86_64

according to comment 4 
Doc Text: Cause: Libvirt did not properly check whether a DAC security label is non-NULL before trying to parse user/group ownership from it.
Consequence: When virDomainGetBlockInfo API is called on a transient domain that has just finished migration to another host, its DAC security label may already be NULL, which crashes libvirtd.
and the title of patch :
Fix crash when saving a domain with type none dac label.

reproduce issue as following steps :

1>try to migrate
1.create a transient domain with seclabel type is none and model is dac
# virsh dumpxml r7 | grep seclabel
        <seclabel model='selinux' labelskip='yes'/>
  <seclabel type='none' model='dac'/>

2. do migration
migrate to dest host:
# virsh migrate  r7 qemu+ssh://$ip/system
root@ip's password:

# virsh list --all
 Id    Name                           State
----------------------------------------------------

migrate back to source host:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 9     r7                             running

check domblkinfo ,libvirt will crash :
# virsh domblklist r7
Target     Source
------------------------------------------------
vda        /tmp/zp/r7.img

# virsh domblkinfo r7 vda
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor

2> try to save / restore
# virsh list
 Id    Name                           State
----------------------------------------------------
 17    r7                             running

# virsh save r7 r7.save
error: Failed to save domain r7 to r7.save
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor

# service libvirtd status
Redirecting to /bin/systemctl status  libvirtd.service
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: failed (Result: signal) since Thu 2014-12-11 18:10:06 CST; 15s ago
  Process: 11911 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=killed, signal=SEGV)
 Main PID: 11911 (code=killed, signal=SEGV)


Verify:

verify version :
libvirt-1.1.1-29.el7_0.4.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.11
kernel-3.10.0-123.17.1.el7.x86_64

steps:

1>try to migrate
1.create a transient domain with seclabel type is none and model is dac
# virsh dumpxml r7 | grep seclabel
        <seclabel model='selinux' labelskip='yes'/>
  <seclabel type='none' model='dac'/>

2. do migration
migrate to dst host:
# virsh migrate  r7 qemu+ssh://$ip/system
root@$ip's password:

# virsh list --all
 Id    Name                           State
----------------------------------------------------

migrate back to source host:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 11     r7                             running

check domblkinfo ,libvirt works well , get domain block info successfully :

# virsh domblklist r7
Target     Source
------------------------------------------------
vda        /tmp/zp/r7.img

# virsh domblkinfo r7 vda
Capacity:       8589934592
Allocation:     8589938688
Physical:       8589938688

2> try to save / restore , domain can save / restore successfully

# virsh list
 Id    Name                           State
----------------------------------------------------
 16    r7                             running

# virsh save r7 r7.save
Domain r7 saved to r7.save

# virsh restore r7.save
Domain restored from r7.save

# virsh list
 Id    Name                           State
----------------------------------------------------
 17    r7                             running

Comment 7 Pei Zhang 2014-12-17 09:28:26 UTC
test in another scenario :

Reproduce version:
libvirt-1.1.1-29.el7_0.3.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.11
kernel-3.10.0-123.17.1.el7.x86_64

steps to reproduce :
1.get libvirt-1.1.1-29.el7_0.3.src.rpm , rebuild the libvirt and add a patch (the patch from bug #1162208 )

2.install new libvirt rpm packets produced in step 1 .
#service libvirtd restart .

3.prepare a domain XML and create a transient domain .
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     r7                             running

# virsh dumpxml r7 | grep seclabel
        <seclabel model='selinux' relabel='yes'/>
  <seclabel type='none' model='dac'/>

4.in one terminal do --p2p migrate domain from source to host
 # virsh migrate r7 --live --p2p qemu+ssh://$ip/system

5.in other terminal check libvirt debug log to find sleep flag 
# tailf libvirt.log | grep SLEEPING
2014-12-17 07:29:56.488+0000: 30884: debug : doPeer2PeerMigrate:4070 : SLEEPING

then do domblkinfo :

# virsh domblkinfo r7 vda
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor

6.using gdb to get crash backtrace and check two threads info involved in this crash (compared with bug  #1162208 comment 5) :

......
(gdb) bt
#0  0x00007f189311f158 in __strchr_sse42 () from /lib64/libc.so.6
#1  0x00007f1895f194d0 in virParseOwnershipIds (label=0x0, uidPtr=uidPtr@entry=0x7f1886b8a818, gidPtr=gidPtr@entry=0x7f1886b8a81c)
    at util/virutil.c:2072
#2  0x00007f187f39129e in qemuOpenFile (driver=driver@entry=0x7f18781567e0, vm=vm@entry=0x7f186c000e90, 
    path=path@entry=0x7f186c00c7e0 "/tmp/zp/r7raw.img", oflags=oflags@entry=0, needUnlink=needUnlink@entry=0x0, 
    bypassSecurityDriver=bypassSecurityDriver@entry=0x0) at qemu/qemu_driver.c:2780
#3  0x00007f187f39a29e in qemuDomainGetBlockInfo (dom=0x7f18781fbba0, path=0x7f186c00c7e0 "/tmp/zp/r7raw.img", info=0x7f1886b8ab40, 
    flags=<optimized out>) at qemu/qemu_driver.c:10124
#4  0x00007f1895f99734 in virDomainGetBlockInfo (domain=domain@entry=0x7f18781fbba0, disk=0x7f18781fe5e0 "vda", info=info@entry=0x7f1886b8ab40, 
    flags=0) at libvirt.c:9110
#5  0x00007f1896992b04 in remoteDispatchDomainGetBlockInfo (server=<optimized out>, msg=<optimized out>, ret=0x7f18781fbf10, args=0x7f18781fbf30, 
    rerr=0x7f1886b8ac80, client=<optimized out>) at remote_dispatch.h:3487
#6  remoteDispatchDomainGetBlockInfoHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f1886b8ac80, 
    args=0x7f18781fbf30, ret=0x7f18781fbf10) at remote_dispatch.h:3463
#7  0x00007f1895ff21ba in virNetServerProgramDispatchCall (msg=0x7f1898781700, client=0x7f1898787500, server=0x7f18987723d0, prog=0x7f189877e150)
    at rpc/virnetserverprogram.c:435
#8  virNetServerProgramDispatch (prog=0x7f189877e150, server=server@entry=0x7f18987723d0, client=0x7f1898787500, msg=0x7f1898781700)
    at rpc/virnetserverprogram.c:305
#9  0x00007f1895fecd28 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f18987723d0)
    at rpc/virnetserver.c:166
#10 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f18987723d0) at rpc/virnetserver.c:187
#11 0x00007f1895f115e5 in virThreadPoolWorker (opaque=opaque@entry=0x7f18987568e0) at util/virthreadpool.c:144
#12 0x00007f1895f10f7e in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:194
#13 0x00007f18937bcdf3 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f18930e33dd in clone () from /lib64/libc.so.6

(gdb) info thread
  Id   Target Id         Frame 
  11   Thread 0x7f188738c700 (LWP 30882) "libvirtd" 0x00007f18937c0705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 10   Thread 0x7f1886b8b700 (LWP 30883) "libvirtd" 0x00007f189311f158 in __strchr_sse42 () from /lib64/libc.so.6
  9    Thread 0x7f188638a700 (LWP 30884) "libvirtd" 0x00007f18930aa8ad in nanosleep () from /lib64/libc.so.6


(gdb) thread 9
[Switching to thread 9 (Thread 0x7f188638a700 (LWP 30884))]
#0  0x00007f18930aa8ad in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f18930aa8ad in nanosleep () from /lib64/libc.so.6
#1  0x00007f18930aa744 in sleep () from /lib64/libc.so.6
#2  0x00007f187f3710f1 in doPeer2PeerMigrate (v3proto=<synthetic pointer>, resource=0, dname=<optimized out>, flags=<optimized out>, 
    listenAddress=<optimized out>, graphicsuri=<optimized out>, uri=<optimized out>, dconnuri=<optimized out>, xmlin=<optimized out>, 
    vm=0x7f186c000e90, sconn=0x7f18781fed00, driver=0x7f18781567e0) at qemu/qemu_migration.c:4071
#3  qemuMigrationPerformJob (driver=driver@entry=0x7f18781567e0, conn=0x7f18781fed00, conn@entry=0x0, vm=vm@entry=0x7f186c000e90, 
    xmlin=xmlin@entry=0x0, dconnuri=<optimized out>, uri=<optimized out>, graphicsuri=0x0, listenAddress=0x0, cookiein=0x0, cookieinlen=0, 
    cookieout=0x7f1886389b58, cookieoutlen=0x7f1886389b54, flags=3, dname=0x0, resource=0, v3proto=true) at qemu/qemu_migration.c:4129
#4  0x00007f187f3723d9 in qemuMigrationPerform (driver=driver@entry=0x7f18781567e0, conn=0x0, vm=vm@entry=0x7f186c000e90, xmlin=0x0, 
    dconnuri=dconnuri@entry=0x7f1858000d80 "qemu+ssh://$ip/system", uri=0x7f18580008c0 "\220\037", graphicsuri=0x0, listenAddress=0x0, 
    cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7f1886389b58, 
    cookieoutlen=cookieoutlen@entry=0x7f1886389b54, flags=flags@entry=3, dname=0x0, resource=0, v3proto=v3proto@entry=true)
    at qemu/qemu_migration.c:4313
#5  0x00007f187f393b0d in qemuDomainMigratePerform3Params (dom=0x7f1858000d40, dconnuri=0x7f1858000d80 "qemu+ssh://$ip/system", 
    params=<optimized out>, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=0x7f1886389b58, cookieoutlen=0x7f1886389b54, flags=3)
    at qemu/qemu_driver.c:10910
#6  0x00007f1895f94bdf in virDomainMigratePerform3Params (domain=domain@entry=0x7f1858000d40, dconnuri=0x7f1858000d80 "qemu+ssh://$ip/system", 
    params=params@entry=0x7f1858000ee0, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=cookieout@entry=0x7f1886389b58, 
    cookieoutlen=cookieoutlen@entry=0x7f1886389b54, flags=3) at libvirt.c:7401
#7  0x00007f18969866ef in remoteDispatchDomainMigratePerform3Params (server=<optimized out>, msg=<optimized out>, ret=0x7f1858000cc0, 
    args=0x7f1858000ce0, rerr=0x7f1886389c80, client=<optimized out>) at remote.c:4978
#8  remoteDispatchDomainMigratePerform3ParamsHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f1886389c80, 
    args=0x7f1858000ce0, ret=0x7f1858000cc0) at remote_dispatch.h:5631
#9  0x00007f1895ff21ba in virNetServerProgramDispatchCall (msg=0x7f18987734b0, client=0x7f1898781e30, server=0x7f18987723d0, prog=0x7f189877e150)
    at rpc/virnetserverprogram.c:435
#10 virNetServerProgramDispatch (prog=0x7f189877e150, server=server@entry=0x7f18987723d0, client=0x7f1898781e30, msg=0x7f18987734b0)
    at rpc/virnetserverprogram.c:305
#11 0x00007f1895fecd28 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f18987723d0)
    at rpc/virnetserver.c:166
#12 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f18987723d0) at rpc/virnetserver.c:187
#13 0x00007f1895f115e5 in virThreadPoolWorker (opaque=opaque@entry=0x7f1898756780) at util/virthreadpool.c:144
#14 0x00007f1895f10f7e in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:194
#15 0x00007f18937bcdf3 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f18930e33dd in clone () from /lib64/libc.so.6

Verify version:

libvirt-1.1.1-29.el7_0.4.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.11
kernel-3.10.0-123.17.1.el7.x86_64

Verify steps:

1.get libvirt-1.1.1-29.el7_0.4.src.rpm . rebuild the libvirt and add a patch (the patch from bug #1162208 )

2.install new libvirt rpm packets produced in step 1 .
#service libvirtd restart .

3.prepare a domain XML and create a transient domain .
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     r7                             running

# virsh dumpxml r7 | grep seclabel
        <seclabel model='selinux' relabel='yes'/>
  <seclabel type='none' model='dac'/>

4.in one terminal do --p2p migrate domain from source to host
 # virsh migrate r7 --live --p2p qemu+ssh://$ip/system

5.in other terminal check libvirt debug log to find sleep flag 
# tailf libvirt.log | grep SLEEPING
2014-12-17 07:29:56.488+0000: 30884: debug : doPeer2PeerMigrate:4070 : SLEEPING

then do domblkinfo :
# virsh domblkinfo r7 vda
Capacity:       8589934592
Allocation:     8589934592
Physical:       8589934592

6.check in dest host , domain running well .

Comment 9 errata-xmlrpc 2015-01-05 20:30:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0008.html


Note You need to log in before you can comment on or make changes to this bug.