Bug 1102881

Summary: virDomainBlockCommit fails with live snapshots on oVirt block storage
Product: [Fedora] Fedora Reporter: Adam Litke <alitke>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 21CC: agedosier, alitke, amit.shah, amureini, berrange, cfergeau, clalancette, crobinso, dwmw2, eblake, fsimonce, itamar, jcody, jforbes, kwolf, laine, libvirt-maint, mst, ogofen, pbonzini, pkrempa, redhat, rjones, scottt.tw, shyu, stefanha, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-04 21:27:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1083310, 1115127, 1168620    
Attachments:
Description Flags
Test script to create two live snapshots
none
libvirtd.log when the problem occurs
none
vdsm log when the problem happens
none
libvirt qemu log for the affected vm
none
qemu debug patch referenced in comment #6
none
New qemu debug patch
none
audit.log with selinux enforcing
none
libvirtd log
none
libvirtd log
none
audit log
none
libvirt log for affected vm
none
audit.log with all denials logged none

Description Adam Litke 2014-05-29 18:28:48 UTC
Description of problem:
In an oVirt environment using iSCSI storage, after live snapshots are taken for a VM, a subsequent virDomainBlockCommit operation will always fail with the following message:

error : qemuMonitorJSONCheckError:352 : internal error: unable to execute QEMU command 'block-commit': Top image file /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/59354326-443a-42dc-b21d-8649056ced83 not found

It seems that some recent changes to how relative paths are handled has caused a regression.  libvirt-1.1.3.5-1.fc20 fails but libvirt-1.1.1-29.el7.x86_64 works correctly.

See subsequent attachments for the script I used to produce this and the resulting debugging output.  Sorry, but I was unable to produce this outside of an oVirt environment.

Version-Release number of selected component (if applicable):
    libvirt-1.1.3.5-1.fc20
    qemu-kvm-1.6.2-5.fc20

How reproducible:
Always, given an oVirt environment.


Steps to Reproduce:
1. Launch a single VM having a single disk that is hosted on an iSCSI oVirt storage domain
2. Execute the attached test script to create the two live snapshots.  The parameters are <storagepool ID> <host 1> <host 2>.
3. Execute the blockcommit virsh command as instructed by the test script on the host where the VM is running

Actual results:
blockcommit fails (see this transcript)

sudo PYTHONPATH=lib python vdsm-test.py be744655-2a19-4919-92f3-e7c20ac03193 ale lager
Running test: merge internal snapshot
Basic test parameters:
vm=af8df38e-2e99-4a89-9483-a540027ea6b1
sp=be744655-2a19-4919-92f3-e7c20ac03193
sd=a246386b-f31d-451d-b8a6-2f4c0dfa27b4
img=dad706d6-e1e9-4f31-87d4-610b3511a932
vol=7e0ec106-f2dc-4567-b050-e3289bf2d47f
Creating volume: 59354326-443a-42dc-b21d-8649056ced83
Waiting for task c871c288-1201-4142-b95b-1eab3a48a51c
c871c288-1201-4142-b95b-1eab3a48a51c: finished
Creating snapshot 59354326-443a-42dc-b21d-8649056ced83 (base 7e0ec106-f2dc-4567-b050-e3289bf2d47f)
Current volume chain: 7e0ec106-f2dc-4567-b050-e3289bf2d47f, 59354326-443a-42dc-b21d-8649056ced83, 378fdb5e-1853-4946-9a79-fd7bd00bb466
Creating volume: ea12203c-36a5-43d7-b351-492ec8c7944d
Waiting for task a3aeeef7-3abf-4408-a1aa-74f7d06108c1
a3aeeef7-3abf-4408-a1aa-74f7d06108c1: finished
Creating snapshot ea12203c-36a5-43d7-b351-492ec8c7944d (base 59354326-443a-42dc-b21d-8649056ced83)
Current volume chain: 7e0ec106-f2dc-4567-b050-e3289bf2d47f, ea12203c-36a5-43d7-b351-492ec8c7944d, 59354326-443a-42dc-b21d-8649056ced83, 378fdb5e-1853-4946-9a79-fd7bd00bb466

    Issue the following command on the host where the vm is running:
    virsh blockcommit af8df38e-2e99-4a89-9483-a540027ea6b1 /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/ea12203c-36a5-43d7-b351-492ec8c7944d --base /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/7e0ec106-f2dc-4567-b050-e3289bf2d47f --top /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/59354326-443a-42dc-b21d-8649056ced83

... on the host running vm ...

sudo virsh blockcommit af8df38e-2e99-4a89-9483-a540027ea6b1 /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/ea12203c-36a5-43d7-b351-492ec8c7944d --base /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/7e0ec106-f2dc-4567-b050-e3289bf2d47f --top /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/59354326-443a-42dc-b21d-8649056ced83
Please enter your authentication name: vdsm@ovirt
Please enter your password: 
error: internal error: unable to execute QEMU command 'block-commit': Top image file /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/dad706d6-e1e9-4f31-87d4-610b3511a932/59354326-443a-42dc-b21d-8649056ced83 not found

Expected results:
blockcommit should start successfully


Additional info:
version of vdsm: vdsm-4.15.0-11.gitcd200ae.fc20.x86_64 (which contains some local changes that should be unrelated to this issue since the same version works with libvirt-1.1.1-29.el7.x86_64 on an el7 host)

Comment 1 Adam Litke 2014-05-29 18:31:01 UTC
Created attachment 900481 [details]
Test script to create two live snapshots

You may use this test script to produce the snapshots and help you construct the correct virsh blockcommit command to use.

Comment 2 Adam Litke 2014-05-29 18:32:18 UTC
Created attachment 900482 [details]
libvirtd.log when the problem occurs

Comment 3 Adam Litke 2014-05-29 18:32:45 UTC
Created attachment 900483 [details]
vdsm log when the problem happens

Comment 4 Adam Litke 2014-05-29 18:33:24 UTC
Created attachment 900484 [details]
libvirt qemu log for the affected vm

Comment 5 Richard W.M. Jones 2014-05-29 20:37:19 UTC
Could it be related to
https://bugzilla.redhat.com/show_bug.cgi?id=1061827#c13 ?

Comment 6 Adam Litke 2014-05-29 21:03:08 UTC
(In reply to Richard W.M. Jones from comment #5)
> Could it be related to
> https://bugzilla.redhat.com/show_bug.cgi?id=1061827#c13 ?

Related but not quite the same thing.  Here we are deliberately passing in valid absolute path names but qemu (bdrv_find_backing_file) path cannot be resolved into one of the actual backing images.

I'm attaching a debug patch that I have applied to qemu and here is the output for a particular run of the code:

bdrv_find_backing_image: looking for /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
 -> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
2
        tmp: /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
        full: /dev/dm-12
 -> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/907e7803-233f-47ad-b48b-232e1aea8697/acc6f7c4-9b6d-4052-abf5-cab1af5f2e8f
2
        tmp: /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
        full: /dev/dm-12
 -> ../907e7803-233f-47ad-b48b-232e1aea8697/378fdb5e-1853-4946-9a79-fd7bd00bb466
2
        tmp: /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
        full: /dev/dm-12

Clearly it should have matched on the first iteration of the loop but realpath() is failing to resolve it.  Prior to launching the blockcommit I ensured that all images in the chain were reachable (no broken symlinks).  I wonder if somehow the qemu backing chain structures have been corrupted by the live snapshot process.

Comment 7 Adam Litke 2014-05-29 21:04:31 UTC
Created attachment 900523 [details]
qemu debug patch referenced in comment #6

Comment 8 Richard W.M. Jones 2014-05-29 21:05:37 UTC
OK, my second guess: Could it be to do with the way that RHEV
backing images are not "realized" always?  You have to do some
faffing around with VDSM, see:
https://www.redhat.com/archives/libguestfs/2014-January/msg00153.html

Comment 9 Peter Krempa 2014-05-30 11:54:32 UTC
(In reply to Adam Litke from comment #6)
> (In reply to Richard W.M. Jones from comment #5)
> > Could it be related to
> > https://bugzilla.redhat.com/show_bug.cgi?id=1061827#c13 ?
> 

...

> Clearly it should have matched on the first iteration of the loop but
> realpath() is failing to resolve it.  Prior to launching the blockcommit I
> ensured that all images in the chain were reachable (no broken symlinks).  I
> wonder if somehow the qemu backing chain structures have been corrupted by
> the live snapshot process.

The libvirtd log you've posted contains an I/O error notification:

2014-05-29 16:26:50.996+0000: 925: debug : qemuMonitorIOProcess:393 : QEMU_MONITOR_IO_PROCESS: mon=0x7f09dc00fc60 buf={"timestamp": {"seconds": 1401380810, "microseconds": 996357}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "operation": "write", "action": "stop"}}

Isn't your storage broken actually?

Also libvirt recognized the the image name and passed it through to qemu. You would get a different error if the image wouldn't be in libvirt's internal struct.

Could you also post output of `virsh qemu-monitor-command --pretty VMNAME '{"execute": "query-block"}'` after the failed block-commit?

Also the mentioned libvirt version libvirt-1.1.3.5-1.fc20 certainly doesn't have any of the backing-store changes you've mentioned.

Comment 10 Peter Krempa 2014-05-30 11:58:14 UTC
Moving to qemu to investigate further. Libvirt seems to do the right thing here.

Comment 11 Adam Litke 2014-05-30 13:12:45 UTC
(In reply to Richard W.M. Jones from comment #8)
> OK, my second guess: Could it be to do with the way that RHEV
> backing images are not "realized" always?  You have to do some
> faffing around with VDSM, see:
> https://www.redhat.com/archives/libguestfs/2014-January/msg00153.html

This cannot be the problem.  I've checked that each volume in the backing chain leads to an activated block device (eg. top == /dev/dm-12).

Comment 12 Adam Litke 2014-05-30 13:17:48 UTC
Here is the query-block output.  Regarding the I/O errors, those only start to happen after the failed block-commit attempt.  

$ sudo virsh qemu-monitor-command --pretty brewer-2 '{"execute": "query-block"}'
Please enter your authentication name: vdsm@ovirt
Please enter your password: 
{
    "return": [
        {
            "io-status": "ok",
            "device": "drive-ide0-1-0",
            "locked": false,
            "removable": true,
            "inserted": {
                "iops_rd": 0,
                "image": {
                    "virtual-size": 75497472,
                    "filename": "/rhev/data-center/mnt/192.168.2.1:_home_storage_iso/e453999e-ec9f-420b-941c-a9f9cc4432bb/images/11111111-1111-1111-1111-111111111111/CorePlus-current.iso",
                    "format": "raw",
                    "actual-size": 75501568
                },
                "iops_wr": 0,
                "ro": true,
                "backing_file_depth": 0,
                "drv": "raw",
                "iops": 0,
                "bps_wr": 0,
                "encrypted": false,
                "bps": 0,
                "bps_rd": 0,
                "file": "/rhev/data-center/mnt/192.168.2.1:_home_storage_iso/e453999e-ec9f-420b-941c-a9f9cc4432bb/images/11111111-1111-1111-1111-111111111111/CorePlus-current.iso",
                "encryption_key_missing": false
            },
            "tray_open": false,
            "type": "unknown"
        },
        {
            "io-status": "failed",
            "device": "drive-virtio-disk0",
            "locked": false,
            "removable": false,
            "inserted": {
                "iops_rd": 0,
                "image": {
                    "backing-image": {
                        "backing-image": {
                            "backing-image": {
                                "virtual-size": 2147483648,
                                "filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/../58cc8d73-0804-41b7-a457-93e7f2280478/378fdb5e-1853-4946-9a79-fd7bd00bb466",
                                "cluster-size": 65536,
                                "format": "qcow2",
                                "actual-size": 0,
                                "dirty-flag": false
                            },
                            "backing-filename-format": "qcow2",
                            "virtual-size": 2147483648,
                            "filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/0ee5ce39-c24f-4f7d-8067-6fce06a0fdfa",
                            "cluster-size": 65536,
                            "format": "qcow2",
                            "actual-size": 0,
                            "full-backing-filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/../58cc8d73-0804-41b7-a457-93e7f2280478/378fdb5e-1853-4946-9a79-fd7bd00bb466",
                            "backing-filename": "../58cc8d73-0804-41b7-a457-93e7f2280478/378fdb5e-1853-4946-9a79-fd7bd00bb466",
                            "dirty-flag": false
                        },
                        "backing-filename-format": "qcow2",
                        "virtual-size": 2147483648,
                        "filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/6f455f42-bac4-4439-8f2f-a2044bc5e708",
                        "cluster-size": 65536,
                        "format": "qcow2",
                        "backing-filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/0ee5ce39-c24f-4f7d-8067-6fce06a0fdfa",
                        "dirty-flag": false
                    },
                    "backing-filename-format": "qcow2",
                    "virtual-size": 2147483648,
                    "filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/2d47ee0c-ccbc-4a54-ad1e-53270861a480",
                    "cluster-size": 65536,
                    "format": "qcow2",
                    "actual-size": 0,
                    "backing-filename": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/6f455f42-bac4-4439-8f2f-a2044bc5e708",
                    "dirty-flag": false
                },
                "iops_wr": 0,
                "ro": false,
                "backing_file_depth": 3,
                "drv": "qcow2",
                "iops": 0,
                "bps_wr": 0,
                "backing_file": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/6f455f42-bac4-4439-8f2f-a2044bc5e708",
                "encrypted": false,
                "bps": 0,
                "bps_rd": 0,
                "file": "/rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/2d47ee0c-ccbc-4a54-ad1e-53270861a480",
                "encryption_key_missing": false
            },
            "type": "unknown"
        }
    ],
    "id": "libvirt-70"
}

Comment 13 Adam Litke 2014-05-30 13:46:54 UTC
Here is the error message from libvirt (to match the query-block output in the previous comment.

$ sudo virsh blockcommit f356f6a7-6c22-48d6-8e33-84eba7521bb1 /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/2d47ee0c-ccbc-4a54-ad1e-53270861a480 --base /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/0ee5ce39-c24f-4f7d-8067-6fce06a0fdfa --top /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/6f455f42-bac4-4439-8f2f-a2044bc5e708
Please enter your authentication name: vdsm@ovirt
Please enter your password: 
error: internal error: unable to execute QEMU command 'block-commit': Top image file /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/58cc8d73-0804-41b7-a457-93e7f2280478/6f455f42-bac4-4439-8f2f-a2044bc5e708 not found

Comment 14 Cole Robinson 2014-05-31 19:58:34 UTC
Adam, can you try latest qemu from fedora virt preview? Might help to determine if qemu is indeed the culprit: http://fedoraproject.org/wiki/Virtualization_Preview_Repository

CCing some block layer folks

Comment 15 Jeff Cody 2014-06-02 13:09:53 UTC
(In reply to Adam Litke from comment #6)
> (In reply to Richard W.M. Jones from comment #5)
> > Could it be related to
> > https://bugzilla.redhat.com/show_bug.cgi?id=1061827#c13 ?
> 
> Related but not quite the same thing.  Here we are deliberately passing in
> valid absolute path names but qemu (bdrv_find_backing_file) path cannot be
> resolved into one of the actual backing images.
> 
> I'm attaching a debug patch that I have applied to qemu and here is the
> output for a particular run of the code:
> 
> bdrv_find_backing_image: looking for
> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/
> 907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
>  ->
> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/
> 907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
> 2
>         tmp:
> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/
> 907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
>         full: /dev/dm-12
>  ->
> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/
> 907e7803-233f-47ad-b48b-232e1aea8697/acc6f7c4-9b6d-4052-abf5-cab1af5f2e8f
> 2
>         tmp:
> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/
> 907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
>         full: /dev/dm-12
>  ->
> ../907e7803-233f-47ad-b48b-232e1aea8697/378fdb5e-1853-4946-9a79-fd7bd00bb466
> 2
>         tmp:
> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/
> 907e7803-233f-47ad-b48b-232e1aea8697/196cd6c6-252d-4ada-b898-bc8652d3e3f6
>         full: /dev/dm-12
> 
> Clearly it should have matched on the first iteration of the loop but
> realpath() is failing to resolve it.  Prior to launching the blockcommit I
> ensured that all images in the chain were reachable (no broken symlinks).  I
> wonder if somehow the qemu backing chain structures have been corrupted by
> the live snapshot process.

This is interesting, because although realpath() is returning NULL, filename_full has contents in the buffer (the contents of filename_full is undefined on NULL).  So it got close :)

Obviously, on that first iteration, some error happened in realpath.  Could you modify your debug patch to print errno (e.g. strerror(errno)), and post the results?  That could help determine                                                                                                                                                                                                                      why realpath failed.

Comment 16 Adam Litke 2014-06-02 14:34:27 UTC
Hi Jeff:

Here is the new output.  Note that new volumes were used in this test so don't try comparing volume IDs from the previous comments:  (The error is -EPERM)

sudo virsh blockcommit b6d88230-7c33-4484-9d50-6a392030e33d /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/b0f8ff80-7633-4eaf-ba7f-07eb00081820 --base /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/67845f45-a4e3-4b12-8d79-62d4b33aca61 --top /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/3d826cd5-97a1-4c1b-9701-4a3de8ff2262

...

bdrv_find_backing_image: looking for /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/3d826cd5-97a1-4c1b-9701-4a3de8ff2262
 -> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/3d826cd5-97a1-4c1b-9701-4a3de8ff2262
2
	tmp: /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/3d826cd5-97a1-4c1b-9701-4a3de8ff2262
	full: /dev/dm-19
realpath failed with error: Permission denied
 -> /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/67845f45-a4e3-4b12-8d79-62d4b33aca61
2
	tmp: /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/3d826cd5-97a1-4c1b-9701-4a3de8ff2262
	full: /dev/dm-19
realpath failed with error: Permission denied
 -> ../119241f1-d78d-49e0-96c5-375ada61e4e2/56f4d756-4ac3-4477-b786-d730aed9e591
2
	tmp: /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/119241f1-d78d-49e0-96c5-375ada61e4e2/3d826cd5-97a1-4c1b-9701-4a3de8ff2262
	full: /dev/dm-19
realpath failed with error: Permission denied

Comment 17 Adam Litke 2014-06-02 14:36:27 UTC
Created attachment 901471 [details]
New qemu debug patch

Comment 18 Adam Litke 2014-06-02 14:45:35 UTC
qemu-kvm-2.0.0-5.fc20.x86_64 from the virt-preview repo appears to fail in the same way:

sudo virsh blockcommit 83d5c6db-0af9-4d1c-93d0-1dde75550a70 /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/649ec064-7818-458e-994f-52df39a066e8/77df4dd9-a079-4dd5-9d7a-623d2c4c1bc4 --base /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/649ec064-7818-458e-994f-52df39a066e8/b85f711d-830b-417e-9025-85b1904d4c96 --top /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/649ec064-7818-458e-994f-52df39a066e8/6af6013f-29f6-4b88-8db4-d662fdfd8bca
Please enter your authentication name: vdsm@ovirt
Please enter your password: 
error: internal error: unable to execute QEMU command 'block-commit': Top image file /rhev/data-center/mnt/blockSD/a246386b-f31d-451d-b8a6-2f4c0dfa27b4/images/649ec064-7818-458e-994f-52df39a066e8/6af6013f-29f6-4b88-8db4-d662fdfd8bca not found

Comment 19 Jeff Cody 2014-06-02 14:58:57 UTC
Thanks Adam,

Since the error from realpath() is EPERM, could you check the system logs to see if there are any selinux errors?

Comment 20 Adam Litke 2014-06-02 15:51:40 UTC
Hey Jeff,

Looks like issuing a 'setenforce 0' on the host makes the problem go away.  I will attach the audit.log in the next comment.

Comment 21 Adam Litke 2014-06-02 15:56:53 UTC
Created attachment 901490 [details]
audit.log with selinux enforcing

Here is the audit.log when the operation is failing.
The qemu pid is 5703
libvirtd is pid 4862

Comment 22 Cole Robinson 2014-06-02 16:41:23 UTC
Example AVCs:

type=AVC msg=audit(1401724213.736:688): avc:  denied  { read } for  pid=5364 comm="qemu-system-x86" path="/dev/dm-12" dev="devtmpfs" ino=91464 scontext=system_u:system_r:svirt_t:s0:c166,c273 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file

type=AVC msg=audit(1401724243.864:701): avc:  denied  { read } for  pid=5364 comm="qemu-system-x86" path="/dev/dm-18" dev="devtmpfs" ino=91807 scontext=system_u:system_r:svirt_t:s0:c166,c273 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file

Comment 23 Peter Krempa 2014-06-05 15:30:13 UTC
hmm, selinux labeling problem will most certainly be a libvirt issue.

Comment 24 Adam Litke 2014-06-13 13:11:26 UTC
Hi Peter,

Anything else I can collect for you to aid in resolving this?

Comment 25 Peter Krempa 2014-06-13 13:38:17 UTC
Do you have any un-commented options in /etc/libvirt/qemu.conf ? Looks like the selinux driver in libvirt isn't working for some reason.

Comment 26 Adam Litke 2014-06-13 15:16:07 UTC
Just these (which seem to be unrelated)

sudo grep -v '^#' /etc/libvirt/qemu.conf
dynamic_ownership=0
spice_tls=1
save_image_format="lzop"
spice_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-spice"
remote_display_port_min=5900
remote_display_port_max=6923
lock_manager="sanlock"
auto_dump_path="/var/log/core"

sudo grep -v '^#' /etc/libvirt/libvirtd.conf
listen_addr="0.0.0.0"
unix_sock_group="qemu"
unix_sock_rw_perms="0770"
auth_unix_rw="sasl"
host_uuid="efd0b34a-1fd6-4e25-87a4-cfbd23ca64a7"
keepalive_interval=-1
log_outputs="1:file:/var/log/libvirt/libvirtd.log"
log_filters="3:virobject 3:virfile 2:virnetlink     3:cgroup 3:event 3:json 1:libvirt 1:util 1:qemu"
ca_file="/etc/pki/vdsm/certs/cacert.pem"
cert_file="/etc/pki/vdsm/certs/vdsmcert.pem"
key_file="/etc/pki/vdsm/keys/vdsmkey.pem"

Comment 27 Adam Litke 2014-07-15 12:54:19 UTC
Hi Peter,

What can we do to move this forward?  It's blocking virDomainBlockCommit for oVirt on block storage domains.  Would be nice to have this fixed in the July release.

Comment 28 Adam Litke 2014-07-30 14:22:44 UTC
Created attachment 922603 [details]
libvirtd log

error is at: 

2014-07-30 14:15:04.470+0000: 1363: error : qemuMonitorJSONCheckError:377 : internal error: unable to execute QEMU command 'block-commit': Top image file /rhev/data-center/aab45993-c5cf-436e-b55a-873bd83681d2/ae92afa6-66e8-479e-bc4d-bb0ac06423de/images/eefd510e-2491-4d68-bfbf-31c47dd3bf65/../eefd510e-2491-4d68-bfbf-31c47dd3bf65/7804a559-f27d-4072-8c3e-525f75052837 not found

Comment 29 Adam Litke 2014-07-30 14:24:00 UTC
Created attachment 922604 [details]
libvirtd log

error message:

2014-07-30 14:15:04.470+0000: 1363: error : qemuMonitorJSONCheckError:377 : internal error: unable to execute QEMU command 'block-commit': Top image file /rhev/data-center/aab45993-c5cf-436e-b55a-873bd83681d2/ae92afa6-66e8-479e-bc4d-bb0ac06423de/images/eefd510e-2491-4d68-bfbf-31c47dd3bf65/../eefd510e-2491-4d68-bfbf-31c47dd3bf65/7804a559-f27d-4072-8c3e-525f75052837 not found

Comment 30 Adam Litke 2014-07-30 14:24:54 UTC
Created attachment 922606 [details]
audit log

I don't see any denials here, but the same issue happens as before.

Comment 31 Adam Litke 2014-07-30 14:25:40 UTC
Created attachment 922607 [details]
libvirt log for affected vm

Comment 32 Peter Krempa 2014-07-31 14:31:06 UTC
I came across one more possible problem that could cause this bug: 
I'm suspecting that somewhere in the path there's something that can't be stat-ed/readlink'd - due to selinux - or whatever realpath() does to canonicalize the path. As nothing showed up in the AVC log it's possible that some of the selinux rules may have "dontaudit" option enabled and thus don't show up in the auditlog.
Would you mind trying https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Fixing_Problems-Possible_Causes_of_Silent_Denials.html

and reproducing the issue and then posting the selinux log? 

Thanks.

Comment 33 Adam Litke 2014-08-01 14:50:38 UTC
Created attachment 923330 [details]
audit.log with all denials logged

Aha, now I am seeing denials for devicemapper devices.

Comment 34 Adam Litke 2014-08-04 15:09:10 UTC
I can confirm that the problem only occurs with live-snapshots that have been created during the current generation of VM execution.  

In other words, if snapshots are created, the VM is then stopped and restarted, the commit operation will succeed.  If snapshots are created and a merge is attempted immediately, the commit operation will fail.

Comment 35 Adam Litke 2014-08-06 14:39:17 UTC
*** Bug 1124445 has been marked as a duplicate of this bug. ***

Comment 36 Peter Krempa 2015-03-23 08:32:09 UTC
The problem should be fixed by the following patch:

commit f9ea3d60119e82c02c00fbf3678c3ed20634dea1
Author: Eric Blake <eblake>
Date:   Wed Mar 11 14:37:04 2015 -0600

    qemu: read backing chain names from qemu
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that
    after a series of disk snapshots into existing destination images,
    followed by active commits of the top image, it is possible for
    qemu 2.2 and earlier to end up tracking a different name for the
    image than what it would have had when opening the chain afresh.
    That is, when starting with the chain 'a <- b <- c', the name
    associated with 'b' is how it was spelled in the metadata of 'c',
    but when starting with 'a', taking two snapshots into 'a <- b <- c',
    then committing 'c' back into 'b', the name associated with 'b' is
    now the name used when taking the first snapshot.
    
    Sadly, older qemu doesn't know how to treat different spellings of
    the same filename as identical files (it uses strcmp() instead of
    checking for the same inode), which means libvirt's attempt to
    commit an image using solely the names learned from qcow2 metadata
    fails with a cryptic:
    
    error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found
    
    even though the file exists.  Trying to teach libvirt the rules on
    which name qemu will expect is not worth the effort (besides, we'd
    have to remember it across libvirtd restarts, and track whether a
    file was opened via metadata or via snapshot creation for a given
    qemu process); it is easier to just always directly ask qemu what
    string it expects to see in the first place.
    
    As a safety valve, we validate that any name returned by qemu
    still maps to the same local file as we have tracked it, so that
    a compromised qemu cannot accidentally cause us to act on an
    incorrect file.
    
    * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New
    prototype.
    * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup):
    Likewise.
    * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function.
    * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup)
    (qemuMonitorJSONDiskNameLookupOne): Likewise.
    * src/qemu/qemu_driver.c (qemuDomainBlockCommit)
    (qemuDomainBlockJobImpl): Use it.

Please note that it might have some dependencies so it's questionable whether it's worth backporting this or just wait for new release.

Comment 37 Cole Robinson 2015-04-02 18:22:51 UTC
*** Bug 1206456 has been marked as a duplicate of this bug. ***

Comment 38 Cole Robinson 2015-04-02 18:28:10 UTC
Peter, I see a few patches of yours referencing backing chains that hit libvirt.git after Eric's fix, are there any other patches that should be backported on top of that?

I'm just looking at backporting to v1.2.13 for now since that's what we are sticking with in F22

Comment 39 Peter Krempa 2015-04-03 09:46:09 UTC
I'll prepare a list of commits that you'll need on top of v1.2.13. I move this back to assigned to me until I do that.

Comment 40 Cole Robinson 2015-04-13 18:07:59 UTC
ping pkrempa, help with the patch list? or just backport the changes to v1.2.13-maint. I'd like to get a build out ASAP for virt test day on thursday. thanks

Comment 41 Peter Krempa 2015-04-15 08:24:38 UTC
Unfortunately the patches that fix the relative naming problem currently break non-shared storage migration so I'd rather not backport anything until we sort it out upstream.

Comment 42 Fedora End Of Life 2015-11-04 15:27:26 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 43 Cole Robinson 2015-11-04 21:27:47 UTC
Since f21 is end-of-life soon, and it doesn't sound like this was going to be an easy backport anyways, just closing. If anyone is still hitting this on later fedora versions, please reopen