Description of problem: After active committing&pivoting, external snapshots cannot be taken any more, error like: # virsh snapshot-create-as generic hda --disk-only --no-metadata --diskspec hda,file=/tmp/generic.s1 error: internal error: unable to execute QEMU command 'transaction': The feature 'snapshot' is not enabled Version-Release number of selected component (if applicable): libvirt-1.3.1-1.fc24_v1.3.1_rc2.x86_64 qemu-kvm-2.5.0-3.fc24.x86_64 How reproducible: 100% Steps to Reproduce: 1. prepare a transient domain # virsh list --transient Id Name State ---------------------------------------------------- 12 generic running # virsh domblklist generic Target Source ------------------------------------------------ hda /var/lib/libvirt/images/generic.qcow2 2. take 3 external snapshots # virsh snapshot-create-as generic s1 --disk-only --no-metadata --diskspec hda,file=/var/lib/libvirt/images/generic.s1 Domain snapshot s1 created # virsh console generic dd a file in guest # dd if=/dev/urandom of=ha bs=1024 count=102400 # virsh snapshot-create-as generic s2 --disk-only --no-metadata --diskspec hda,file=/var/lib/libvirt/images/generic.s2 Domain snapshot s2 created dd a file in guest # dd if=/dev/urandom of=haha bs=1024 count=102400 # virsh snapshot-create-as generic s3 --disk-only --no-metadata --diskspec hda,file=/var/lib/libvirt/images/generic.s3 Domain snapshot s3 created dd a file in guest # dd if=/dev/urandom of=hahaha bs=1024 count=102400 # virsh domblklist generic Target Source ------------------------------------------------ hda /var/lib/libvirt/images/generic.s3 # qemu-img info /var/lib/libvirt/images/generic.s3 --backing-chain image: /var/lib/libvirt/images/generic.s3 file format: qcow2 virtual size: 8.0G (8589934592 bytes) disk size: 248M cluster_size: 65536 backing file: /var/lib/libvirt/images/generic.s2 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /var/lib/libvirt/images/generic.s2 file format: qcow2 virtual size: 8.0G (8589934592 bytes) disk size: 56M cluster_size: 65536 backing file: /var/lib/libvirt/images/generic.s1 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /var/lib/libvirt/images/generic.s1 file format: qcow2 virtual size: 8.0G (8589934592 bytes) disk size: 132M cluster_size: 65536 backing file: /var/lib/libvirt/images/generic.qcow2 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /var/lib/libvirt/images/generic.qcow2 file format: qcow2 virtual size: 8.0G (8589934592 bytes) disk size: 2.0G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: true refcount bits: 16 corrupt: false 3. active commit & pivot # virsh blockcommit generic hda --active --wait --verbose Block commit: [100 %] Now in synchronized phase # virsh blockjob generic hda Active Block Commit: [100 %] # virsh blockjob generic hda --pivot # virsh domblklist generic Target Source ------------------------------------------------ hda /var/lib/libvirt/images/generic.qcow2 4. take 1 external snapshot # virsh snapshot-create-as generic s4 --disk-only --no-metadata --diskspec hda,file=/tmp/generic.s1 error: internal error: unable to execute QEMU command 'transaction': The feature 'snapshot' is not enabled Actual results: Expected results: External snapshots can be taken Additional info: Cannot reproduce it on RHEL7.2
Created attachment 1116560 [details] libvirt log
Does this work on fedora 23? If so, can you see if f23 qemu + rawhide libvirt works? That will tell us if it's qemu or libvirt
I tried to use qmp-shell to execute the snapshot cmd in step4: # qemu/scripts/qmp/qmp-shell /var/lib/libvirt/qemu/domain-nn/monitor.sock Welcome to the QMP low-level shell! Connected to QEMU 2.5.0 (QEMU) blockdev-snapshot-sync device=drive-virtio-disk0 snapshot-file=/tmp/kakka format=qcow2 {"error": {"class": "GenericError", "desc": "The feature 'snapshot' is not enabled"}} From the error message, I think it's a bug in qemu.
jcody, any thoughts? seems like a snapshot regression in qemu 2.5
Indeed, it does look like a regression. I've also confirmed that it is present in the current QEMU master upstream. I'm working to bisect it now. Here are some easy steps to reproduce with just QEMU (via QMP commands): qemu-system-x86_64 -enable-kvm -drive file=$1,if=virtio -m 1024 -boot c -qmp stdio { "execute": "qmp_capabilities" } { "execute": "blockdev-snapshot-sync", "arguments": { "device": "virtio0","snapshot-file":"tmp.qcow2","format": "qcow2" } } { "execute": "block-commit", "arguments": { "device": "virtio0" } } { "execute": "block-job-complete", "arguments": { "device": "virtio0" }} { "execute": "blockdev-snapshot-sync", "arguments": { "device": "virtio0","snapshot-file":"tmp2.qcow2","format": "qcow2" } }
Git bisect shows this to be a regression caused by: commit 3f09bfbc7bee812a44838f4c8b254007a9b86cab Refs: v2.5.0-640-g3db34bf AuthorDate: Tue Sep 15 11:58:23 2015 +0200 CommitDate: Fri Oct 16 15:34:30 2015 +0200 block: Add and use bdrv_replace_in_backing_chain() This cleans up the mess we left behind in the mirror code after the previous patch. Instead of using bdrv_swap(), just change pointers. The interface change of the mirror job that callers must consider is that after job completion, their local BDS pointers still point to the same node now. qemu-img must change its code accordingly (which makes it easier to understand); the other callers stays unchanged because after completion they don't do anything with the BDS, but just with the job, and the job is still owned by the source BDS.
*** Bug 1302064 has been marked as a duplicate of this bug. ***
Submitted a patch upstream, and to qemu-stable, with a fix: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg06163.html
*** Bug 1300165 has been marked as a duplicate of this bug. ***
Applying Jeffs patch solves the problem for me. thanks.
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle. Changing version to '24'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase
qemu-2.5.0-10.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-1b264ab4a4
qemu-2.5.0-10.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-1b264ab4a4
qemu-2.5.0-10.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.