Bug 1217185

Summary: Sometimes libvirt fails to update domain block file after blockcommit.
Product: [Community] Virtualization Tools Reporter: Matthew Schumacher <matt.s>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DEFERRED QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: crobinso, dyuan, matt.s, pkrempa, rbalakri, shyu, xuzhang
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Other   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Slackware 14.1 x86_64
Last Closed: 2016-04-10 21:12:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Schumacher 2015-04-29 17:45:24 UTC
I just stumbled on another bug while snapshotting and think it's related to 1210903 and 1197592 as it seems like some sort of race condition because it depends on what logging is in place and doesn't happen every time.

Here are the details:

I wrote this test script to snapshot and commit over and over:

#!/bin/sh

while [ 1 ]; do

  echo "Starting snapshot test `date`"
  virsh snapshot-create-as test 20150429 20150429-backup --disk-only --atomic
  virsh domblklist test      
  virsh blockcommit test vda --active --pivot --verbose
  virsh snapshot-delete test 20150429 --metadata
  virsh domblklist test
  rm /glustervol1/vm/test/test.20150429
  echo "Ending snapshot test `date`"
  echo
  echo

  sleep 2

done

If I run libvirtd in the foreground with debug set to 1 I can't get it to fail, it does what it's supposed to do, snapshot and commit over and over.

If I run libvirtd in the foreground with debug set to 3, then I will always eventually get this:

Starting snapshot test Wed Apr 29 09:34:34 AKDT 2015
Domain snapshot 20150429 created
Target     Source
------------------------------------------------
vda        /glustervol1/vm/test/test.20150429
hdc        /dev/sr0

Block Commit: [100 %]
Successfully pivoted
Domain snapshot 20150429 deleted

Target     Source
------------------------------------------------
vda        /glustervol1/vm/test/test.20150429
hdc        /dev/sr0

Ending snapshot test Wed Apr 29 09:34:35 AKDT 2015


Starting snapshot test Wed Apr 29 09:34:37 AKDT 2015
error: unsupported configuration: source for disk 'vda' is not a regular file; refusing to generate external snapshot name

Target     Source
------------------------------------------------
vda        /glustervol1/vm/test/test.20150429
hdc        /dev/sr0

error: internal error: qemu block name '/glustervol1/vm/test/test.qcow2' doesn't match expected '/glustervol1/vm/test/test.20150429'

error: Domain snapshot not found: no domain snapshot with matching name '20150429'

Target     Source
------------------------------------------------
vda        /glustervol1/vm/test/test.20150429
hdc        /dev/sr0

rm: can't remove '/glustervol1/vm/test/test.20150429': No such file or directory
Ending snapshot test Wed Apr 29 09:34:37 AKDT 2015

At this point libvirt is confused about which file is the backing store because the first run did pivot after blockcommit, but didn't update the block file.  From the logs:

2015-04-29 17:33:41.052+0000: 25192: warning : qemuDomainObjTaint:1972 : Domain id=2 name='test' uuid=4b9cc25b-68d1-4ce8-8a65-2a378e255e36 is tainted: high-privileges
2015-04-29 17:34:37.322+0000: 25191: error : virDomainSnapshotAlignDisks:609 : unsupported configuration: source for disk 'vda' is not a regular file; refusing to generate external snapshot name
2015-04-29 17:34:37.352+0000: 25194: error : qemuMonitorJSONDiskNameLookup:3977 : internal error: unable to find backing name for device drive-virtio-disk0
2015-04-29 17:34:37.354+0000: 25194: error : qemuMonitorJSONDiskNameLookupOne:3914 : internal error: qemu block name '/glustervol1/vm/test/test.qcow2' doesn't match expected '/glustervol1/vm/test/test.20150429'


So libvirt insists that the block file is: 

root@wasvirt2:/glustervol1/vm/waspbx# virsh domblklist test
Target     Source
------------------------------------------------
vda        /glustervol1/vm/test/test.20150429
hdc        /dev/sr0

But that file isn't in use and isn't what qemu is using:

root@wasvirt2:/glustervol1/vm/waspbx# lsof | grep test
25300   /usr/bin/qemu-system-x86_64     /var/log/libvirt/qemu/test.log
25300   /usr/bin/qemu-system-x86_64     /var/log/libvirt/qemu/test.log
25300   /usr/bin/qemu-system-x86_64     /glustervol1/vm/test/test.qcow2

The only way to straighten this out is to destroy and start the domain.

Comment 1 Matthew Schumacher 2015-04-29 19:45:47 UTC
I can now confirm that I can replicate this issue for both qemu 2.3.0 and qemu 2.2.1, but qemu 2.1.3 works just fine which further indicates that this may be related to 1210903 and 1197592.

Comment 2 Peter Krempa 2015-04-30 07:27:20 UTC
Which version of libvirt is your distro using?

Comment 3 Matthew Schumacher 2015-04-30 17:11:19 UTC
libvirt-1.2.14-x86_64-1_SBo.tgz

From:

http://slackbuilds.org/repository/14.1/libraries/libvirt/

Comment 4 Matthew Schumacher 2015-05-11 20:52:38 UTC
Just tried this with libvirt-1.2.15 built with the same slackbuild script from the link above and still have the same issue with qemu-2.2.1 or qemu-2.3.0.

Comment 5 Cole Robinson 2016-04-10 21:12:21 UTC
Those two bugs you referenced in the first comment have patches in libvirt.git, at least version 1.3.2, so I'm guessing this is fixed upstream too. Closing, but if you reproduce with more recent libvirt+qemu, please reopen