Bug 903248
| Summary: | vdsm: cannot run a vm with several snapshots | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||
| Component: | libvirt | Assignee: | Eric Blake <eblake> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.4 | CC: | abaron, acathrow, bazulay, cpelland, cwei, dallan, djasa, dossow, dyuan, eblake, gwatson, hateya, iheim, jdenemar, jentrena, jkt, juzhang, lpeer, mzhan, pkrempa, shyu, yafu, ydu | ||||
| Target Milestone: | rc | Keywords: | Regression, ZStream | ||||
| Target Release: | 6.4 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | libvirt-0.10.2-19.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: libvirt followed relative backing chains differently than qemu
Consequence: missing sVirt permissions when libvirt couldn't follow the chain
Fix: relative backing files are now treated identically in libvirt and qemu
Result: vdsm use of relative backing files works again
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-21 08:41:35 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 878460, 896690, 915354 | ||||||
| Attachments: |
|
||||||
Fede, dup of bug 890956 ? while testing https://bugzilla.redhat.com/show_bug.cgi?id=878041 I hit this issue again. this bug is reproduced 100% with scenario below: 1. run a HA vm 2. create 3 live snapshots 3. kill -9 the vm's pid currently the vm fails to start with the following error: Thread-52157::ERROR::2013-02-04 11:15:57,005::vm::700::vm.Vm::(_startUnderlyingVm) vmId=`09256c11-7021-485d-92d3-37f3faba5689`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 662, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1518, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 104, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/b08d859c-f654-4976-8c7c-e102d4c252fb/43ce579b-57d0-4201-91b9-f2436f925a50/images/4af22343-6df8-4c36-adc9-bca01e423e39/b63b4 75f-c08c-4be9-a577-2c6e262c9afe,if=none,id=drive-virtio-disk0,format=qcow2,serial=4af22343-6df8-4c36-adc9-bca01e423e39,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/b08d859c-f654-4976-8c7c-e1 02d4c252fb/43ce579b-57d0-4201-91b9-f2436f925a50/images/4af22343-6df8-4c36-adc9-bca01e423e39/b63b475f-c08c-4be9-a577-2c6e262c9afe: Operation not permitted for QE - when verifying this fix please also make sure to test HA vm's in above scenario. Dafna - does it happen on 3.1.3 (with a RHEL 6.4 based VDSM) ? yes - it also reproduces in 3.1.3
Thread-258923::ERROR::2013-02-04 12:43:28,184::vm::680::vm.Vm::(_startUnderlyingVm) vmId=`19eff98f-c99a-41d7-81de-d806690aaedc`::The vm start process failed
Traceback (most recent call last):
File "/usr/share/vdsm/vm.py", line 642, in _startUnderlyingVm
self._run()
File "/usr/share/vdsm/libvirtvm.py", line 1472, in _run
self._connection.createXML(domxml, flags),
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML
if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/3653dcb1-d4ce-477f-acb7-4fe7116b7950/127a0092-fde3-454c-bae5-edc00aa136d4/images/dee76e92-3790-464e-940f-f17f8c15f629/f7530761-a03d-46fd-871d-4ff3575fbbbf,if=none,id=drive-virtio-disk0,format=qcow2,serial=dee76e92-3790-464e-940f-f17f8c15f629,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/3653dcb1-d4ce-477f-acb7-4fe7116b7950/127a0092-fde3-454c-bae5-edc00aa136d4/images/dee76e92-3790-464e-940f-f17f8c15f629/f7530761-a03d-46fd-871d-4ff3575fbbbf: Permission denied
Thread-258923::DEBUG::2013-02-04 12:43:28,192::vm::1027::vm.Vm::(setDownStatus) vmId=`19eff98f-c99a-41d7-81de-d806690aaedc`::Changed state to Down: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/3653dcb1-d4ce-477f-acb7-4fe7116b7950/127a0092-fde3-454c-bae5-edc00aa136d4/images/dee76e92-3790-464e-940f-f17f8c15f629/f7530761-a03d-46fd-871d-4ff3575fbbbf,if=none,id=drive-virtio-disk0,format=qcow2,serial=dee76e92-3790-464e-940f-f17f8c15f629,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/3653dcb1-d4ce-477f-acb7-4fe7116b7950/127a0092-fde3-454c-bae5-edc00aa136d4/images/dee76e92-3790-464e-940f-f17f8c15f629/f7530761-a03d-46fd-871d-4ff3575fbbbf: Permission denied
Dafna, what version did it work in that it's marked as a regression? I did not see a problem running the same with rhel6.3 but I am willing to recheck. I ran the same test with rhel6.3 and the vm restarted with no issues. so yes - this is a regression from vdsm-4.9.6-44.3.el6_3.x86_64 with 2.6.32-279.19.1.el6.x86_64 My feeling is that there's something wrong on how libvirt is computing the relative path: 2013-02-04 15:42:57.618+0000: 21720: error : absolutePathFromBaseFile:560 : Can't canonicalize path '../4af22343-6df8-4c36-adc9-bca01e423e39/d2cae84a-8f59-414c-9040-d4df7b9141ad': No such file or directory 2013-02-04 15:42:57.618+0000: 21720: warning : virStorageFileGetMetadataFromBuf:736 : Backing file '../4af22343-6df8-4c36-adc9-bca01e423e39/d2cae84a-8f59-414c-9040-d4df7b9141ad' of image '/dev/dm-46' is missing. 2013-02-04 15:42:57.618+0000: 21720: debug : virFileClose:72 : Closed fd 26 2013-02-04 15:42:57.618+0000: 21720: debug : qemuSetupDiskPathAllow:71 : Process path /rhev/data-center/b08d859c-f654-4976-8c7c-e102d4c252fb/43ce579b-57d0-4201-91b9-f2436f925a50/images/4af22343-6df8-4c36-adc9-bca01e423e39/b63b475f-c08c-4be9-a577-2c6e262c9afe for disk The image is present (also prior starting the vm, no race): # ls -l /rhev/data-center/b08d859c-f654-4976-8c7c-e102d4c252fb/43ce579b-57d0-4201-91b9-f2436f925a50/images/4af22343-6df8-4c36-adc9-bca01e423e39 total 24 lrwxrwxrwx. 1 vdsm kvm 78 Feb 3 17:30 3d4da434-ccc7-486b-b29f-98d2651c2a1b -> /dev/43ce579b-57d0-4201-91b9-f2436f925a50/3d4da434-ccc7-486b-b29f-98d2651c2a1b lrwxrwxrwx. 1 vdsm kvm 78 Feb 3 17:33 85e55d89-d790-4ef1-8ef3-45113a787a8e -> /dev/43ce579b-57d0-4201-91b9-f2436f925a50/85e55d89-d790-4ef1-8ef3-45113a787a8e lrwxrwxrwx. 1 vdsm kvm 78 Feb 3 17:34 a5bea3ee-9f07-4253-bde2-106d15309feb -> /dev/43ce579b-57d0-4201-91b9-f2436f925a50/a5bea3ee-9f07-4253-bde2-106d15309feb lrwxrwxrwx. 1 vdsm kvm 78 Feb 4 11:14 b63b475f-c08c-4be9-a577-2c6e262c9afe -> /dev/43ce579b-57d0-4201-91b9-f2436f925a50/b63b475f-c08c-4be9-a577-2c6e262c9afe lrwxrwxrwx. 1 vdsm kvm 78 Feb 4 10:59 be749868-5666-4751-b449-fecc44eb228c -> /dev/43ce579b-57d0-4201-91b9-f2436f925a50/be749868-5666-4751-b449-fecc44eb228c lrwxrwxrwx. 1 vdsm kvm 78 Feb 4 10:58 d2cae84a-8f59-414c-9040-d4df7b9141ad -> /dev/43ce579b-57d0-4201-91b9-f2436f925a50/d2cae84a-8f59-414c-9040-d4df7b9141ad After that cgroups is probably not set correctly and the qemu process fails with: /rhev/data-center/b08d859c-f654-4976-8c7c-e102d4c252fb/43ce579b-57d0-4201-91b9-f2436f925a50/images/4af22343-6df8-4c36-adc9-bca01e423e39/b63b475f-c08c-4be9-a577-2c6e262c9afe: Operation not permitted (I encountered already EPERM associated with cgroup). There might also be something strange going on with lvm as the lv a5bea3ee-9f07-4253-bde2-106d15309fe is reported as not active: # lvs -o +tags | grep 4af22343-6df8-4c36-adc9-bca01e423e39 a5bea3ee-9f07-4253-bde2-106d15309feb 43ce579b-57d0-4201-91b9-f2436f925a50 -wi------ 1.00g PU_00000000-0000-0000-0000-000000000000,IU_4af22343-6df8-4c36-adc9-bca01e423e39,MD_22 b63b475f-c08c-4be9-a577-2c6e262c9afe 43ce579b-57d0-4201-91b9-f2436f925a50 -wi-a---- 1.00g MD_25,PU_be749868-5666-4751-b449-fecc44eb228c,IU_4af22343-6df8-4c36-adc9-bca01e423e39 be749868-5666-4751-b449-fecc44eb228c 43ce579b-57d0-4201-91b9-f2436f925a50 -wi-a---- 1.00g MD_24,IU_4af22343-6df8-4c36-adc9-bca01e423e39,PU_d2cae84a-8f59-414c-9040-d4df7b9141ad d2cae84a-8f59-414c-9040-d4df7b9141ad 43ce579b-57d0-4201-91b9-f2436f925a50 -wi-a---- 1.00g PU_a5bea3ee-9f07-4253-bde2-106d15309feb,IU_4af22343-6df8-4c36-adc9-bca01e423e39,MD_23 even though: # ls -l $(readlink -f /dev/43ce579b-57d0-4201-91b9-f2436f925a50/a5bea3ee-9f07-4253-bde2-106d15309feb) brw-rw----. 1 vdsm qemu 253, 41 Feb 3 17:34 /dev/dm-41 trying to activate it again fails with: # lvchange -ay 43ce579b-57d0-4201-91b9-f2436f925a50/a5bea3ee-9f07-4253-bde2-106d15309feb device-mapper: create ioctl on 43ce579b--57d0--4201--91b9--f2436f925a50-a5bea3ee--9f07--4253--bde2--106d15309feb failed: Device or resource busy I'm not sure if this weird status of lvm is the cause or one of the effects anyway this doesn't seem related to VDSM. Eric any idea? Definitely an unintentional regression in 6.4 as part of my refactoring into virStorageFileGetMEtadataRecurse. The code is seeing that the block name is a symlink, and doing a complete canonicalize_file_name() down to /dev/dm-46, then trying to treat the backing file relative to the dirname of the canonical name; but /dev/../uuid doesn't exist. Rather, libvirt should be following the ../uuid backing file name relative to the dirname that existed prior to resolving the symlink of the basename. I'm working on a patch, but ran out of time today. Should we reassign or clone to libvirt? Upstream patch series proposed: https://www.redhat.com/archives/libvir-list/2013-February/msg00340.html *** Bug 896507 has been marked as a duplicate of this bug. *** Fixed upstream as of v1.0.2-177-ga18452d with:
commit a18452d0d2aedf65b83b24983ce5746cfe68e71a
Author: Eric Blake <eblake>
Date: Wed Feb 13 11:04:05 2013 -0700
storage: test backing chain traversal
Testing our backing chain handling will make it much easier to
ensure that we avoid issues in the future. If only I had written
this test before I first caused several regressions...
* tests/virstoragetest.c: New test.
* tests/Makefile.am (test_programs): Build it.
* .gitignore: Ignore new files.
commit d1333dd0fbb944a7d5fe56180bcf6b0fd4f85e18
Author: Eric Blake <eblake>
Date: Wed Feb 6 15:48:15 2013 -0700
storage: don't follow backing chain symlinks too eagerly
If you have a qcow2 file /path1/to/file pointed to by symlink
/path2/symlink, and pass qemu /path2/symlink, then qemu treats
a relative backing file in the qcow2 metadata as being relative
to /path2, not /path1/to. Yes, this means that it is possible
to create a qcow2 file where the choice of WHICH directory and
symlink you access its contents from will then determine WHICH
backing file (if any) you actually find; the results can be
rather screwy, but we have to match what qemu does.
Libvirt and qemu default to creating absolute backing file
names, so most users don't hit this. But at least VDSM uses
symlinks and relative backing names alongside the
--reuse-external flags to libvirt snapshot operations, with the
result that libvirt was failing to follow the intended chain of
backing files, and then backing files were not granted the
necessary sVirt permissions to be opened by qemu.
See https://bugzilla.redhat.com/show_bug.cgi?id=903248 for
more gory details. This fixes a regression introduced in
commit 8250783.
I tested this patch by creating the following chain:
ls /home/eblake/Downloads/Fedora.iso # raw file for base
cd /var/lib/libvirt/images
qemu-img create -f qcow2 \
-obacking_file=/home/eblake/Downloads/Fedora.iso,backing_fmt=raw one
mkdir sub
cd sub
ln -s ../one onelink
qemu-img create -f qcow2 \
-obacking_file=../sub/onelink,backing_fmt=qcow2 two
mv two ..
ln -s ../two twolink
qemu-img create -f qcow2 \
-obacking_file=../sub/twolink,backing_fmt=qcow2 three
mv three ..
ln -s ../three threelink
then pointing my domain at /var/lib/libvirt/images/sub/threelink.
Prior to this patch, I got complaints about missing backing
files; afterwards, I was able to verify that the backing chain
(and hence DAC and SELinux relabels) of the entire chain worked.
* src/util/virstoragefile.h (_virStorageFileMetadata): Add
directory member.
* src/util/virstoragefile.c (absolutePathFromBaseFile): Drop,
replaced by...
(virFindBackingFile): ...better function.
(virStorageFileGetMetadataInternal): Add an argument.
(virStorageFileGetMetadataFromFD, virStorageFileChainLookup)
(virStorageFileGetMetadata): Update callers.
commit 2485f921536659181c9afa30d4fc1ee99dc064ae
Author: Eric Blake <eblake>
Date: Wed Feb 6 16:33:45 2013 -0700
storage: refactor metadata lookup
Prior to this patch, we had the callchains:
external users
\-> virStorageFileGetMetadataFromFD
\-> virStorageFileGetMetadataFromBuf
virStorageFileGetMetadataRecurse
\-> virStorageFileGetMetadataFromFD
\-> virStorageFileGetMetadataFromBuf
However, a future patch wants to add an additional parameter to
the bottom of the chain, for use by virStorageFileGetMetadataRecurse,
without affecting existing external callers. Since there is only a
single caller of the internal function, we can repurpose it to fit
our needs, with this patch giving us:
external users
\-> virStorageFileGetMetadataFromFD
\-> virStorageFileGetMetadataInternal
virStorageFileGetMetadataRecurse /
\-> virStorageFileGetMetadataInternal
* src/util/virstoragefile.c (virStorageFileGetMetadataFromFD):
Move most of the guts...
(virStorageFileGetMetadataFromBuf): ...here, and rename...
(virStorageFileGetMetadataInternal): ...to this.
(virStorageFileGetMetadataRecurse): Use internal helper.
commit b7df4f92d6a62f4ec7513af11590f77152e0a19b
Author: Eric Blake <eblake>
Date: Sat Feb 9 06:41:01 2013 -0700
storage: prepare for refactoring
virStorageFileGetMetadataFromFD is the only caller of
virStorageFileGetMetadataFromBuf; and it doesn't care about the
difference between a return of 0 (total success) or 1
(metadata was inconsistent, but pointer was populated as best
as possible); only about a return of -1 (could not read metadata
or out of memory). Changing the return type, and normalizing
the variable names used, will make merging the functions easier
in the next commit.
* src/util/virstoragefile.c (virStorageFileGetMetadataFromBuf):
Change return value, and rename some variables.
(virStorageFileGetMetadataFromFD): Rename some variables.
commit 5e4946d4d98badb9982ff78d795e1bbd9829ccde
Author: Eric Blake <eblake>
Date: Wed Feb 6 16:54:08 2013 -0700
storage: rearrange functions
No semantic change; done so the next patch doesn't need a forward
declaration of a static function.
* src/util/virstoragefile.c (virStorageFileProbeFormatFromBuf):
Hoist earlier.
Moving to POST.
I can reproduce this bug with libvirt-0.10.2-18.el6.x86_64 1) need RHEVM env and ISCSI storage 2) start a vm 3) create 3 live snapshot via rhevm 4) kill -9 vm's pid 5) try to start vm libvirtd log error : error : qemuProcessWaitForMonitor:1752 : internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/16550381-93bb-4c0a-b6ba-c16b73868978/9f38a056-aa10-4f2e-a58e-7ad7525ce6a2/images/f4747e80-33a4-45ab-b994-18d453c8b976/eb52e017-4d0c-4cb4-ae8c-4e948f4c5fde,if=none,id=drive-virtio-disk0,format=qcow2,serial=f4747e80-33a4-45ab-b994-18d453c8b976,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/16550381-93bb-4c0a-b6ba-c16b73868978/9f38a056-aa10-4f2e-a58e-7ad7525ce6a2/images/f4747e80-33a4-45ab-b994-18d453c8b976/eb52e017-4d0c-4cb4-ae8c-4e948f4c5fde: Operation not permitted Wenlong *** Bug 922419 has been marked as a duplicate of this bug. *** Verify this bug with libvirt-0.10.2-19.el6.x86_64 1) need RHEVM env and ISCSI storage 2) start a vm 3) create 3 live snapshot via rhevm 4) kill -9 vm's pid 5) try to start vm 6) vm start successfully without error like below: libvirtd log error : error : qemuProcessWaitForMonitor:1752 : internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/16550381-93bb-4c0a-b6ba-c16b73868978/9f38a056-aa10-4f2e-a58e-7ad7525ce6a2/images/f4747e80-33a4-45ab-b994-18d453c8b976/eb52e017-4d0c-4cb4-ae8c-4e948f4c5fde,if=none,id=drive-virtio-disk0,format=qcow2,serial=f4747e80-33a4-45ab-b994-18d453c8b976,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/16550381-93bb-4c0a-b6ba-c16b73868978/9f38a056-aa10-4f2e-a58e-7ad7525ce6a2/images/f4747e80-33a4-45ab-b994-18d453c8b976/eb52e017-4d0c-4cb4-ae8c-4e948f4c5fde: Operation not permitted so verify this bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1581.html |
Created attachment 685972 [details] logs Description of problem: I created a vm with several snapshots and when I try to run it we fail with thr following error: qemu-kvm: -drive file=/rhev/data-center/cee3603b-2308-4973-97a8-480f7d6d2132/c6443d8b-14e3-425d-9d3f-2e59820ea3e5/images/0cccaded-3aec-45b9-a24e-1a91d9fe968c/9b7d1880-9be1-4661-b194-4402ae1ac745,if=none,id=drive-virtio-disk0,format=qcow 2,serial=0cccaded-3aec-45b9-a24e-1a91d9fe968c,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/cee3603b-2308-4973-97a8-480f7d6d2132/c6443d8b-14e3-425d-9d3f-2e59820ea3e5/images/0cccaded-3aec-45b9 -a24e-1a91d9fe968c/9b7d1880-9be1-4661-b194-4402ae1ac745: Operation not permitted Version-Release number of selected component (if applicable): sf3 vdsm-4.10.2-3.0.el6ev.x86_64 libvirt-0.10.2-16.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a vm with several snapshots 2. try to run the vm 3. Actual results: we fail to run the vm Expected results: we should be able to run the vm Additional info: attaching all logs Thread-130836::ERROR::2013-01-23 12:31:30,294::vm::680::vm.Vm::(_startUnderlyingVm) vmId=`9a11037b-0546-4057-b98f-6014ef8b81d5`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 642, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1486, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error process exited while connecting to monitor: char device redirected to /dev/pts/1 qemu-kvm: -drive file=/rhev/data-center/cee3603b-2308-4973-97a8-480f7d6d2132/c6443d8b-14e3-425d-9d3f-2e59820ea3e5/images/bb40470e-e8b2-4b13-9cec-cded6f27ac6c/b257f7a9-3782-4b4d-8bdb-615a9f2dadb9,if=none,id=drive-ide0-0-0,format=qcow2,serial=bb40470e-e8b2-4b13-9cec-cded6f27ac6c,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/cee3603b-2308-4973-97a8-480f7d6d2132/c6443d8b-14e3-425d-9d3f-2e59820ea3e5/images/bb40470e-e8b2-4b13-9cec-cded6f27ac6c/b257f7a9-3782-4b4d-8bdb-615a9f2dadb9: Operation not permitted Thread-130836::DEBUG::2013-01-23 12:31:30,310::vm::1027::vm.Vm::(setDownStatus) vmId=`9a11037b-0546-4057-b98f-6014ef8b81d5`::Changed state to Down: internal error process exited while connecting to monitor: char device redirected to /dev/pts/1