Bug 1290647

Summary: libvirtd will crash when try to redefine a snapshot for the snapshot is creating
Product: Red Hat Enterprise Linux 6 Reporter: Jingjing Shao <jishao>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.8CC: dyuan, hhan, mzhan, rbalakri, shyu, yanyang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-56.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 19:25:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jingjing Shao 2015-12-11 03:41:16 UTC
Description of problem:
libvirtd will crash when try to redefine a snapshot for the snapshot is creating

PKG
OS-rhel6.7
libvirt-libvirt-0.10.2-55.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.481.el6.x86_64

How reproducible:
100%

Setup
A running healthy guest rh6


Steps to Reproduce:
1.# vim rh6-snapshot.xml
<domainsnapshot>
<name>rh6-snapshot</name>
<description>snapshot API test - create snapshot</description>
</domainsnapshot>

2.# virsh snapshot-create rh6 rh6-snapshot.xml --halt

When step2 is running, execute step3 and step4

3.# virsh snapshot-dumpxml rh6 rh6-snapshot > snap1.xml

4.# virsh snapshot-create rh6 --redefine --current snap1.xml

Then the libvirtd crash


Expected results:
The libvirtd should not crash


Additional info:
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fed9bb84700 (LWP 14910)]
virDomainSnapshotDropParent (snapshot=0x7fed880080a0) at conf/snapshot_conf.c:1007
1007	    curr = snapshot->parent->first_child;
(gdb) t a a bt 

Thread 11 (Thread 0x7fed9c585700 (LWP 14909)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
#1  0x0000003f98865846 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:117
#2  0x0000003f98865e13 in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:103
#3  0x0000003f98865669 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#4  0x00000031c0e07a51 in start_thread (arg=0x7fed9c585700) at pthread_create.c:301
#5  0x00000031c0ae896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 10 (Thread 0x7fed9bb84700 (LWP 14910)):
#0  virDomainSnapshotDropParent (snapshot=0x7fed880080a0) at conf/snapshot_conf.c:1007
#1  0x000000000047135f in qemuDomainSnapshotCreateXML (domain=0x7fed7c000930, xmlDesc=<value optimized out>, flags=<value optimized out>) at qemu/qemu_driver.c:12541
#2  0x0000003f98901997 in virDomainSnapshotCreateXML (domain=0x7fed7c000930, 
    xmlDesc=0x7fed7c0011c0 "<domainsnapshot>\n  <name>generic-snapshot</name>\n  <description>snapshot API test - create snapshot</description>\n  <state>running</state>\n  <creationTime>1449804555</creationTime>\n  <memory snapshot="..., flags=3) at libvirt.c:18016
#3  0x0000000000431e3f in remoteDispatchDomainSnapshotCreateXML (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, 
    rerr=0x7fed9bb83b80, args=0x7fed7c02f9a0, ret=0x7fed7c02f9e0) at remote_dispatch.h:5894
#4  remoteDispatchDomainSnapshotCreateXMLHelper (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, rerr=0x7fed9bb83b80, 
    args=0x7fed7c02f9a0, ret=0x7fed7c02f9e0) at remote_dispatch.h:5870
#5  0x0000003f98948f62 in virNetServerProgramDispatchCall (prog=0x1e26ea0, server=0x1e1e450, client=0x1e22290, msg=0x1e220d0) at rpc/virnetserverprogram.c:431
#6  virNetServerProgramDispatch (prog=0x1e26ea0, server=0x1e1e450, client=0x1e22290, msg=0x1e220d0) at rpc/virnetserverprogram.c:304
#7  0x0000003f98945eee in virNetServerProcessMsg (srv=<value optimized out>, client=0x1e22290, prog=<value optimized out>, msg=0x1e220d0) at rpc/virnetserver.c:170
#8  0x0000003f9894658c in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=0x1e1e450) at rpc/virnetserver.c:191
#9  0x0000003f98865d7c in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:144
#10 0x0000003f98865669 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#11 0x00000031c0e07a51 in start_thread (arg=0x7fed9bb84700) at pthread_create.c:301
#12 0x00000031c0ae896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 9 (Thread 0x7fed9b183700 (LWP 14911)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
#1  0x0000003f98865846 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:117
#2  0x0000003f98865e13 in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:103
---Type <return> to continue, or q <return> to quit---
#3  0x0000003f98865669 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#4  0x00000031c0e07a51 in start_thread (arg=0x7fed9b183700) at pthread_create.c:301
#5  0x00000031c0ae896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 1 Peter Krempa 2015-12-11 16:04:15 UTC
It would be helpful if you wouldn't truncate threads that are doing actual work and left behind idle worker threads. There are two relevant threads in my reproduction attempt:

Thread 11 (Thread 0x7f6e9afbe700 (LWP 23726)):
#0  virDomainSnapshotDropParent (snapshot=0x7f6e7800a0b0) at conf/snapshot_conf.c:1007
#1  0x000000000047135f in qemuDomainSnapshotCreateXML (domain=0x7f6e6c0c5ee0, xmlDesc=<value optimized out>, 
    flags=<value optimized out>) at qemu/qemu_driver.c:12541
#2  0x00007f6ea4216997 in virDomainSnapshotCreateXML (domain=0x7f6e6c0c5ee0, 
    xmlDesc=0x7f6e6c0c92e0 "<domainsnapshot>\n  <name>test</name>\n  <state>running</state>\n  <creationTime>1449849610</creationTime>\n  <memory snapshot='internal'/>\n  <disks>\n    <disk name='vda' snapshot='internal'/>\n  </disks>\n"..., flags=3)
    at libvirt.c:18016
#3  0x0000000000431e3f in remoteDispatchDomainSnapshotCreateXML (server=<value optimized out>, client=<value optimized out>, 
    msg=<value optimized out>, rerr=0x7f6e9afbdb80, args=0x7f6e6c0c9270, ret=0x7f6e6c0c92b0) at remote_dispatch.h:5894
#4  remoteDispatchDomainSnapshotCreateXMLHelper (server=<value optimized out>, client=<value optimized out>, 
    msg=<value optimized out>, rerr=0x7f6e9afbdb80, args=0x7f6e6c0c9270, ret=0x7f6e6c0c92b0) at remote_dispatch.h:5870
#5  0x00007f6ea425df62 in virNetServerProgramDispatchCall (prog=0x1d191f0, server=0x1d0f4b0, client=0x1d15350, msg=0x1d170c0)
    at rpc/virnetserverprogram.c:431
#6  virNetServerProgramDispatch (prog=0x1d191f0, server=0x1d0f4b0, client=0x1d15350, msg=0x1d170c0)
    at rpc/virnetserverprogram.c:304
#7  0x00007f6ea425aeee in virNetServerProcessMsg (srv=<value optimized out>, client=0x1d15350, prog=<value optimized out>, 
    msg=0x1d170c0) at rpc/virnetserver.c:170
#8  0x00007f6ea425b58c in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=0x1d0f4b0) at rpc/virnetserver.c:191
#9  0x00007f6ea417ad7c in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:144
#10 0x00007f6ea417a669 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#11 0x00007f6ea375ba51 in start_thread (arg=0x7f6e9afbe700) at pthread_create.c:301
#12 0x00007f6ea30a193d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115


Thread 8 (Thread 0x7f6e991bb700 (LWP 23729)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
#1  0x00007f6ea417a846 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:117
#2  0x000000000049b4bc in qemuMonitorSend (mon=0x7f6e7400bc90, msg=<value optimized out>) at qemu/qemu_monitor.c:914
#3  0x000000000049ff3e in qemuMonitorJSONCommandWithFd (mon=0x7f6e7400bc90, cmd=0x7f6e78000910, scm_fd=-1, reply=0x7f6e991ba778)
    at qemu/qemu_monitor_json.c:271
#4  0x00000000004a86d3 in qemuMonitorJSONHumanCommandWithFd (mon=0x7f6e7400bc90, cmd_str=0x7f6e7800d2f0 "savevm \"test\"", 
    scm_fd=-1, reply_str=0x7f6e991ba820) at qemu/qemu_monitor_json.c:1024
#5  0x000000000049c98d in qemuMonitorHMPCommandWithFd (mon=0x7f6e7400bc90, cmd=0x7f6e78000cc0 "savevm \"test\"", 
    scm_fd=<value optimized out>, reply=<value optimized out>) at qemu/qemu_monitor.c:955
#6  0x00000000004ab454 in qemuMonitorTextCreateSnapshot (mon=<value optimized out>, name=<value optimized out>)
    at qemu/qemu_monitor_text.c:2838
#7  0x00000000004a5f49 in qemuMonitorJSONCreateSnapshot (mon=0x7f6e7400bc90, name=0x7f6e78000bd0 "test")
    at qemu/qemu_monitor_json.c:3399
#8  0x0000000000471447 in qemuDomainSnapshotCreateActiveInternal (domain=<value optimized out>, xmlDesc=<value optimized out>, 
    flags=136) at qemu/qemu_driver.c:11713
#9  qemuDomainSnapshotCreateXML (domain=<value optimized out>, xmlDesc=<value optimized out>, flags=136)
    at qemu/qemu_driver.c:12631
#10 0x00007f6ea4216997 in virDomainSnapshotCreateXML (domain=0x7f6e78000ce0, 
    xmlDesc=0x7f6e78000c30 "<domainsnapshot>\n  <name>test</name>\n</domainsnapshot>\n", flags=8) at libvirt.c:18016
#11 0x0000000000431e3f in remoteDispatchDomainSnapshotCreateXML (server=<value optimized out>, client=<value optimized out>, 
    msg=<value optimized out>, rerr=0x7f6e991bab80, args=0x7f6e78000bf0, ret=0x7f6e780008c0) at remote_dispatch.h:5894
#12 remoteDispatchDomainSnapshotCreateXMLHelper (server=<value optimized out>, client=<value optimized out>, 
    msg=<value optimized out>, rerr=0x7f6e991bab80, args=0x7f6e78000bf0, ret=0x7f6e780008c0) at remote_dispatch.h:5870
#13 0x00007f6ea425df62 in virNetServerProgramDispatchCall (prog=0x1d191f0, server=0x1d0f4b0, client=0x1d16d20, msg=0x1d16e00)
    at rpc/virnetserverprogram.c:431
#14 virNetServerProgramDispatch (prog=0x1d191f0, server=0x1d0f4b0, client=0x1d16d20, msg=0x1d16e00)
    at rpc/virnetserverprogram.c:304
#15 0x00007f6ea425aeee in virNetServerProcessMsg (srv=<value optimized out>, client=0x1d16d20, prog=<value optimized out>, 
    msg=0x1d16e00) at rpc/virnetserver.c:170
#16 0x00007f6ea425b58c in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=0x1d0f4b0) at rpc/virnetserver.c:191
#17 0x00007f6ea417ad7c in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:144
#18 0x00007f6ea417a669 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#19 0x00007f6ea375ba51 in start_thread (arg=0x7f6e991bb700) at pthread_create.c:301
#20 0x00007f6ea30a193d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

All other worker threads are idle.

Comment 3 Yang Yang 2015-12-12 01:37:43 UTC
The issue cannot be reproduced with scratch build

# rpm -q libvirt
libvirt-0.10.2-56.el6_rc.33ae2f56.x86_64

Reproducer:
# virsh list
 Id    Name                           State
----------------------------------------------------
 3     vm1                            running

# virsh snapshot-create vm1 rh6-snapshot.xml 
Domain snapshot rh6-snapshot created from 'rh6-snapshot.xml'

# virsh snapshot-dumpxml vm1 rh6-snapshot > snap1.xml; virsh snapshot-create vm1 --redefine snap1.xml 
error: Timed out during operation: cannot acquire state change lock

Comment 5 Han Han 2016-02-01 09:19:21 UTC
I can reproduce it on libvirt-libvirt-0.10.2-55.el6.x86_64.
Verify it on libvirt-0.10.2-56.el6.x86_64:
1. Prepare a running guest:
# virsh list 
 Id    Name                           State
----------------------------------------------------
 9     nn                             running

2. Do snapshot redefine while create snapshot with --halt:
./reproduce.sh:
```
#!/bin/bash -x
virsh snapshot-create-as nn --halt --name "nn" &
virsh snapshot-dumpxml nn nn|virsh snapshot-create nn --redefine --current /dev/stdin
```
# ./reproduce.sh
+ set -o nounset
+ virsh snapshot-create-as nn --halt --name nn
+ virsh snapshot-dumpxml nn nn
+ virsh snapshot-create nn --redefine --current /dev/stdin
error: Domain snapshot not found: no domain snapshot with matching name 'nn'
error: (domain_snapshot):2: Start tag expected, '<' not found
(null)
^
Domain snapshot nn created

3. Add --disk-only option in the script and run it:
# ./reproduce.sh
+ virsh snapshot-create-as nn --halt --disk-only --name n1
+ virsh snapshot-dumpxml nn n1
+ virsh snapshot-create nn --redefine --current /dev/stdin
error: Domain snapshot not found: no domain snapshot with matching name 'n1'
error: (domain_snapshot):2: Start tag expected, '<' not found
(null)
^
Domain snapshot n1 created

4. Without --halt option and run the script:
+ virsh snapshot-create-as nn --name n2
+ virsh snapshot-dumpxml nn n2
+ virsh snapshot-create nn --redefine --current /dev/stdin
error: Domain snapshot not found: no domain snapshot with matching name 'n2'
error: (domain_snapshot):2: Start tag expected, '<' not found
(null)
^
Domain snapshot n2 created


Libvirtd doesn't crash when running the script. And finally the snapshot created.

Comment 7 errata-xmlrpc 2016-05-10 19:25:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0738.html