RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1199182 - 2nd active commit after snapshot triggers qemu failure
Summary: 2nd active commit after snapshot triggers qemu failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.1
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1199036
Blocks: 1177220 1203119 1232701
TreeView+ depends on / blocked
 
Reported: 2015-03-05 15:05 UTC by Eric Blake
Modified: 2017-03-13 19:02 UTC (History)
17 users (show)

Fixed In Version: libvirt-1.2.14-1.el7
Doc Type: Bug Fix
Doc Text:
Cause: qemu uses strcmp() instead of inode comparison when searching for a file name in a backing chain. However, it tracks a different initial string for a backing chain member when opening an existing chain than it does when creating a snapshot. Consequence: If libvirt passes a different string than what qemu is expecting, even though that string resolves to the same file, then qemu is unable to perform blockpull and blockcommit actions involving that part of the chain. Fix: Libvirt was taught to query qemu's notion of the string associated with a backing chain element. Result: It is now possible to create multiple snapshots and then use consecutive blockcommit to clean them up without hitting an error message from qemu complaining about an unknown file name.
Clone Of:
: 1203119 (view as bug list)
Environment:
Last Closed: 2015-11-19 06:18:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
bug demo (1.45 KB, application/x-shellscript)
2015-03-05 15:05 UTC, Eric Blake
no flags Details
reproducer using only qemu (5.91 KB, application/x-shellscript)
2015-03-05 21:48 UTC, Jeff Cody
no flags Details
bug verification with gluster backend (3.24 KB, application/x-shellscript)
2015-06-17 09:33 UTC, Yang Yang
no flags Details
bug verification with nfs backend (3.41 KB, application/x-shellscript)
2015-06-17 09:34 UTC, Yang Yang
no flags Details
bug verification with block device (2.55 KB, application/x-shellscript)
2015-06-17 09:35 UTC, Yang Yang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description Eric Blake 2015-03-05 15:05:08 UTC
Created attachment 998420 [details]
bug demo

Description of problem:
Run the attached script.  It basically creates a backing chain of:

images/a/a <- images/b/b <- images/c/c

then does two active commits back down to a.  If qemu is started from scratch on the full chain, the process succeeds; but if the chain is created at runtime, qemu chokes on the second commit (libvirt is merely reporting qemu's error message).  I'm originally opening this bug against libvirt, in case there is some way for libvirt to work around the problem without having to patch qemu, but may reassign it as I experiment more with why qemu is choking on the second call.  At any rate, I've verified that libvirt is sending identical block-commit QMP commands in both sequences.

Version-Release number of selected component (if applicable):
At the time of this bug report, the issue is reproducible in upstream qemu.git and libvirt.git; therefore, all RHEL builds with active commit support (RHEL 7.1) are affected.

How reproducible:
100%

Steps to Reproduce:
1. See the demo script
2.
3.

Actual results:
# ./testvm3-start 
Domain testvm3 destroyed

Formatting 'images/a/a', fmt=qcow2 size=10485760 encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting 'b', fmt=qcow2 size=10485760 backing_file='../a/a' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting 'c', fmt=qcow2 size=10485760 backing_file='../b/b' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Domain testvm3 created from /dev/stdin

Block Commit: [100 %]
Successfully pivoted
Block Commit: [100 %]
Successfully pivoted
Domain snapshot 1425567580 created
Domain snapshot 1425567580 created
Block Commit: [100 %]
Successfully pivoted
error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found


Expected results:
Final pivot should succeed, not fail.

Additional info:

Comment 1 Eric Blake 2015-03-05 15:27:39 UTC
This may be relevant; grepping the output of 'query-block' before the two commit attempts shows:

*** Before first commits
                            "filename": "/tmp/images/c/../b/../a/a",
                        "backing-filename-format": "qcow2",
                        "filename": "/tmp/images/c/../b/b",
                        "backing-filename": "/tmp/images/c/../b/../a/a",
                    "backing-filename-format": "qcow2",
                    "filename": "/tmp/images/c/c",
                    "backing-filename": "/tmp/images/c/../b/b",
*** Before second commits
                            "filename": "/tmp/images/c/../b/../a/a",
                        "backing-filename-format": "qcow2",
                        "filename": "/tmp/images/b/b",
                        "backing-filename": "/tmp/images/c/../b/../a/a",
                    "backing-filename-format": "qcow2",
                    "filename": "/tmp/images/c/c",
                    "backing-filename": "/tmp/images/b/b",

That is, when qemu locates 'b' by chasing 'c's metadata, it knows the file as /tmp/images/c/../b/b.  But when qemu is told to locate 'b' during snapshot creation, it knows the file as /tmp/images/b/b.

Libvirt is executing this (modulo different 'id' fields) for BOTH commit sequences:
 "{\"execute\":\"block-commit\",\"arguments\":{\"device\":\"drive-virtio-disk0\",\"top\":\"/tmp/images/c/c\",\"base\":\"/tmp/images/c/../b/b\"},\"id\":\"libvirt-11\"}"
"{\"execute\":\"block-commit\",\"arguments\":{\"device\":\"drive-virtio-disk0\",\"top\":\"/tmp/images/c/../b/b\",\"base\":\"/tmp/images/c/../b/../a/a\"},\"id\":\"libvirt-17\"}"

But because qemu is tracking the name of b differently between the two attempts, it may explain why the second attempt fails to locate the starting point (especially if qemu is using strcmp on original strings rather than checking canonical file names).

Comment 3 Jeff Cody 2015-03-05 21:48:23 UTC
Created attachment 998591 [details]
reproducer using only qemu

This script reproduces the bug; specifically, it does the test 3 ways:

1) With the image chain pre-created, and then block-commits of the active layer.  This is successful.

2) With the image chain formed dynamically via live snapshots, and then block-commits of the active layer, using the relative pathname for each active layer (e.g. /tmp/images/b/../c/c).  This is successful.

3.) With the image chain formed dynamically via live snapshots, and then block-commits of the active layer, using the absolute pathname for the initial block-commit active layer (e.g. /tmp/images/c/c).  This fails, as QEMU cannot find the image.

Comment 4 Jeff Cody 2015-03-05 21:57:09 UTC
(In reply to Eric Blake from comment #1)

[...]

> 
> That is, when qemu locates 'b' by chasing 'c's metadata, it knows the file
> as /tmp/images/c/../b/b.  But when qemu is told to locate 'b' during
> snapshot creation, it knows the file as /tmp/images/b/b.
> 

[...]


This is indeed what is happening.  Please see my attachment #998591 [details] and the info in comment #3 - each run through dumps the block info before the relevant block-commit commands.

If you are looking for a solution that does not involve a new QEMU version, could libvirt run block-query prior to issuing block-commit?  The filename string QEMU will be searching for will be present in the block info chain returned, and libvirt could parse that to determine what to use for the filename for image in the chain.

(This is more reason why using node-names at some point for all operations would be preferable to filenames)

Comment 5 Eric Blake 2015-03-06 00:17:14 UTC
The ultimate upstream solution would be converting libvirt over to use node name rather than filename references, but that won't be backportable. Also, it would be nice to clone this bug to qemu to handle all filenames that resolve to the same location, rather than just a string compare on a particular (and not necessarily canonical) spelling of a filename.  But such a clone can be deferred to 7.2 if we can teach libvirt to work with existing qemu.  The idea of having libvirt check 'query-block' to track actual names in use by qemu seems worthwhile, and from the perspective of a minimal fix, it is appealing (if we backport to 7.1.z, doing JUST a libvirt patch is nicer than requiring both a libvirt and a qemu patch).  So I'm currently in the middle of coding up an attempted solution to teach libvirt to use 'query-block' to reset its notion of the filenames qemu is using after any operation that alters the chain of files in use by a disk device.

Comment 6 Shanzhi Yu 2015-03-06 03:44:44 UTC
I can reproduce it with steps just as in comment 0.

1. Prepare a backing chains with relative path

2. Create external snapshot with using the existing backing chains when guest is running

3. Do active blockcommit, will met error on second time

Comment 7 Shaolong Hu 2015-03-06 05:12:53 UTC
(In reply to Jeff Cody from comment #3)
> Created attachment 998591 [details]
> reproducer using only qemu
> 
> This script reproduces the bug; specifically, it does the test 3 ways:
> 
> 1) With the image chain pre-created, and then block-commits of the active
> layer.  This is successful.
> 
> 2) With the image chain formed dynamically via live snapshots, and then
> block-commits of the active layer, using the relative pathname for each
> active layer (e.g. /tmp/images/b/../c/c).  This is successful.
> 
> 3.) With the image chain formed dynamically via live snapshots, and then
> block-commits of the active layer, using the absolute pathname for the
> initial block-commit active layer (e.g. /tmp/images/c/c).  This fails, as
> QEMU cannot find the image.

Hi Jeff,

Do we need a bug for qemu?

Bests,
Shaolong

Comment 9 Adam Litke 2015-03-11 13:07:16 UTC
Hi Eric,

Do you have any updates on your progress?  Are you still planning to deliver the libvirt-based workaround?  If so, what upstream libvirt release would you expect to have the fix and when would you expect a zStream for RHEL7?

Thanks!

Comment 10 Eric Blake 2015-03-11 21:03:41 UTC
Upstream proposed patch:
https://www.redhat.com/archives/libvir-list/2015-March/msg00605.html

Comment 18 Eric Blake 2015-03-16 15:13:54 UTC
bug 1199036 regarding blockjob lock problems explains some of the crashes observed while testing this bug (and Peter's patches mentioned in comment 16, for those that can read it); marking it as a dependency (any z-stream build needs both sets of patches)

Comment 20 Eric Blake 2015-03-16 21:47:22 UTC
Potential 7.1.z patches: http://post-office.corp.redhat.com/archives/rhvirt-patches/2015-March/msg00447.html

Comment 22 Eric Blake 2015-03-17 21:06:41 UTC
POST for 7.2 via rebase

Comment 23 Eric Blake 2015-03-17 21:19:07 UTC
(In reply to Eric Blake from comment #20)
> Potential 7.1.z patches:
> http://post-office.corp.redhat.com/archives/rhvirt-patches/2015-March/
> msg00447.html

Updated version for 7.1.z: http://post-office.corp.redhat.com/archives/rhvirt-patches/2015-March/msg00479.html

Comment 27 Yang Yang 2015-04-10 08:34:15 UTC
Hi Eric,
Regarding the issue mentioned in comment#26, I can always reproduce it on a disk on which no OS is installed. Active commit with pivot is finished and return with no error, however, domain xml is not changed

libvirt version
libvirt-1.2.14-1.el7.x86_64
qemu-kvm-rhev-2.2.0-8.el7.x86_64

Steps to reproduce
1. Prepare a running guest with the following xml
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/vm1.qcow2'/>  ==> bootable disk
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/test.img'/>  ==> no OS is installed
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

2. create 3 external disk only snapshot for vdb
#  virsh snapshot-create-as testvm3 --no-metadata --disk-only --diskspec vda,snapshot=no --diskspec vdb,file=/var/lib/libvirt/images/test.s1
Domain snapshot 1428653468 created
#  virsh snapshot-create-as testvm3 --no-metadata --disk-only --diskspec vda,snapshot=no --diskspec vdb,file=/var/lib/libvirt/images/test.s2
Domain snapshot 1428653472 created
#  virsh snapshot-create-as testvm3 --no-metadata --disk-only --diskspec vda,snapshot=no --diskspec vdb,file=/var/lib/libvirt/images/test.s3
Domain snapshot 1428653473 created

3. dump domain xml
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/vm1.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/test.s3'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/test.s2'/>
        <backingStore type='file' index='2'>
          <format type='qcow2'/>
          <source file='/var/lib/libvirt/images/test.s1'/>
          <backingStore type='file' index='3'>
            <format type='raw'/>
            <source file='/var/lib/libvirt/images/test.img'/>
            <backingStore/>
          </backingStore>
        </backingStore>
      </backingStore>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

4. Do active blockcommit
# virsh -k0 blockcommit --active --verbose testvm3 vdb --shallow --pivot
Block Commit: [100 %]
Successfully pivoted

5. check domain xml
# virsh dumpxml testvm3 | grep disk -a10
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/vm1.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/test.s3'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/test.s2'/>
        <backingStore type='file' index='2'>
          <format type='qcow2'/>
          <source file='/var/lib/libvirt/images/test.s1'/>
          <backingStore type='file' index='3'>
            <format type='raw'/>
            <source file='/var/lib/libvirt/images/test.img'/>
            <backingStore/>
          </backingStore>
        </backingStore>
      </backingStore>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

6. # virsh -k0 blockcommit --active --verbose testvm3 vdb --shallow --pivot
error: internal error: qemu block name '/var/lib/libvirt/images/test.s2' doesn't match expected '/var/lib/libvirt/images/test.s3'

Comment 29 Yang Yang 2015-06-17 09:32:14 UTC
As blockcommit with pivot always fail due to bz1197592. In order to not block to verify this bz, do pivot after several seconds when guest in mirror phrase

Verified on libvirt-1.2.16-1.el7.x86_64

Tested nfs, gluster backend and block device as virtual disk
1. nfs backend
cd /tmp || exit 1
umount /tmp/images
virsh -k0 destroy testvm3 2>/dev/null
rm -rf images

mkdir -p images || exit 1
mount 10.66.5.63:/var/lib/libvirt/images/nfs /tmp/images

mkdir -p images/a images/b images/c images/d || exit 1

(cd images/a &&
cp /var/lib/libvirt/images/vm1.qcow2 a) || exit 1
sleep 5
(cd images/b &&
qemu-img create -f qcow2 -o backing_file=../a/a,backing_fmt=qcow2 b) || exit 1
(cd images/c &&
qemu-img create -f qcow2 -o backing_file=../b/b,backing_fmt=qcow2 c) || exit 1
(cd images/d &&
qemu-img create -f qcow2 -o backing_file=../c/c,backing_fmt=qcow2 d) || exit 1

emulator='<emulator>/usr/libexec/qemu-kvm</emulator>'

virsh -k0 create /dev/stdin <<EOF
<domain type='kvm'>
<name>testvm3</name>
<memory unit='MiB'>256</memory>
 <vcpu>1</vcpu>
 <os>
   <type arch='x86_64'>hvm</type>
 </os>
 <devices>
   $emulator
   <disk type='file' device='disk'>
     <driver name='qemu' type='qcow2'/>
     <source file='/tmp/images/d/d'/>
     <target dev='vda' bus='virtio'/>
   </disk>
   <graphics type='vnc'/>
 </devices>
</domain>
EOF

sleep 60

virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --wait
sleep 5
virsh blockjob testvm3 vda --pivot
virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --wait
sleep 5
virsh blockjob testvm3 vda --pivot
virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --wait
sleep 5
virsh blockjob testvm3 vda --pivot


virsh -k0 snapshot-create-as --no-metadata testvm3 --reuse-external \
      --disk-only --diskspec vda,file=/tmp/images/b/b
virsh -k0 snapshot-create-as --no-metadata testvm3 --reuse-external \
      --disk-only --diskspec vda,file=/tmp/images/c/c
virsh -k0 snapshot-create-as --no-metadata testvm3 --reuse-external \
      --disk-only --diskspec vda,file=/tmp/images/d/d


virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --wait
sleep 5
virsh blockjob testvm3 vda --pivot
virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --wait
sleep 5
virsh blockjob testvm3 vda --pivot
virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --wait
sleep 5
virsh blockjob testvm3 vda --pivot

All passed, see attached scripts

Comment 30 Yang Yang 2015-06-17 09:33:33 UTC
Created attachment 1039840 [details]
bug verification with gluster backend

Comment 31 Yang Yang 2015-06-17 09:34:26 UTC
Created attachment 1039841 [details]
bug verification with nfs backend

Comment 32 Yang Yang 2015-06-17 09:35:04 UTC
Created attachment 1039843 [details]
bug verification with block device

Comment 34 errata-xmlrpc 2015-11-19 06:18:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.