I upgraded an host to the latest version of vdsm: vdsm-4.18.21-1.el7.centos.x86_64, on a CentOS Linux release 7.3.1611 (Core) I then created a disk that I wanted to attach to a running vm, but il fails, with the message in /var/log/libvirt/qemu/<VMNAME>.log: Could not open '/rhev/data-center/17434f4e-8d1a-4a88-ae39-d2ddd46b3b9b/7c5291d3-11e2-420f-99ad-47a376013671/images/dd6ebebe-b43f-4b9b-981a-e390d89cb36c/21dc1593-6629-47d8-a7eb-0aca060b7e26': Permission denied I tried to have a look at the disks images, and got a strange result: -rw-rw---- 1 vdsm qemu 1.0M May 18 2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/ids -rw-rw---- 1 vdsm qemu 16M May 18 2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/inbox -rw-rw---- 1 vdsm qemu 2.0M May 18 2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/leases -rw-r--r-- 1 vdsm qemu 451 May 18 2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/metadata -rw-rw---- 1 vdsm qemu 16M May 18 2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/outbox -rw-rw---- 1 vdsm qemu 30K Jan 27 12:33 7c5291d3-11e2-420f-99ad-47a376013671/images/3a00232b-c1f9-4b9b-910e-caf8b0321609/4f6d5c63-6a36-4356-832e-f52427d9512e -rw-rw---- 1 vdsm qemu 32G Jan 27 12:34 7c5291d3-11e2-420f-99ad-47a376013671/images/465df4e9-3c62-4501-889f-cbab65ed0e0d/7a9b9033-f5f8-4eaa-ac94-6cc0c4ff6120 -rw-rw---- 1 vdsm qemu 32G Jan 27 12:34 7c5291d3-11e2-420f-99ad-47a376013671/images/b0f4c517-e492-409f-934f-1561281a242b/a3d60d8a-f89b-41dd-b519-fb652301b1f5 -rw-rw---- 1 vdsm qemu 16G Jan 9 17:25 7c5291d3-11e2-420f-99ad-47a376013671/images/baf01c4e-ede9-4e4e-a265-172695d81a83/4cdd72a7-b347-4479-accd-ab08d61552f9 -rw-rw---- 1 vdsm kvm 200G Jan 27 12:26 7c5291d3-11e2-420f-99ad-47a376013671/images/dd6ebebe-b43f-4b9b-981a-e390d89cb36c/21dc1593-6629-47d8-a7eb-0aca060b7e26 -rw-rw---- 1 vdsm qemu 30K Jan 27 12:33 7c5291d3-11e2-420f-99ad-47a376013671/images/ed18c515-09c9-4a71-af0a-7f0934193a65/b5e53c81-2279-4f2b-b282-69db430d36d4 The new disk is the 200G one, owned by kvm. The failing disk indeed belongs to kvm instead of qemu. Any explanation for that ?
please provide {super,}vdsm.log and /var/log/messages it seems the the udev rule did not kick in for this disk on time.
How can I send that privately ? /var/log/messages contains a lot of non-public informations
Even if you sent it to me in private, I would have to share it with others anyway in order to solve the issue. I don't have any trick but to sanitize your log with many `sed` lines. BTW, which version of udev, systemd, libvirt do you have? and what about the vdsm logs?
Those informations are not secret. I just don't want them to be published on a public facing site where any random crawler can get it. It's running on an up-to-date fully patched CentOS Linux release 7.3.1611. I will give you more details tomorrow.
Created attachment 1245867 [details] rpms versions
Created attachment 1245868 [details] output from dmesg
Created attachment 1245869 [details] libvirt logs for the VM failing to attach the disk
Created attachment 1245870 [details] journactl -b output
Created attachment 1245871 [details] /var/log/messages
Created attachment 1245872 [details] vdsm.log
Informations about the users in the hosts: id vdsm uid=36(vdsm) gid=36(kvm) groups=36(kvm),179(sanlock),107(qemu) id qemu uid=107(qemu) gid=107(qemu) groups=107(qemu),11(cdrom)
moving to storage for further investigation
Fabrice, the problematic disk is on a block domain or a file domain?
An export of the domain return: <StorageDomain href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671" id="7c5291d3-11e2-420f-99ad-47a376013671"> <actions> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/isattached" rel="isattached"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/updateovfstore" rel="updateovfstore"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/refreshluns" rel="refreshluns"/> </actions> <name>XXX</name> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/permissions" rel="permissions"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/templates" rel="templates"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/vms" rel="vms"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/disks" rel="disks"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/storageconnections" rel="storageconnections"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/disksnapshots" rel="disksnapshots"/> <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/diskprofiles" rel="diskprofiles"/> <data_centers> <data_center id="17434f4e-8d1a-4a88-ae39-d2ddd46b3b9b"/> </data_centers> <type>data</type> <external_status> <state>ok</state> </external_status> <master>true</master> <storage> <type>localfs</type> <path>/data/ovirt/data</path> </storage> <available>1989643599872</available> <used>56908316672</used> <committed>300647710720</committed> <storage_format>v3</storage_format> <wipe_after_delete>false</wipe_after_delete> <warning_low_space_indicator>10</warning_low_space_indicator> <critical_space_action_blocker>5</critical_space_action_blocker> </StorageDomain> Is that what you need ?
Moving out all non blocker\exceptions.
Hi Fabrice, A few questions and requests: 1. Can you please provide the output of "ls -lZ /data/ovirt/data"? 2. The right ownership of images and snapshots is vdsm:kvm. It seems like the problematic disk has the right ownership and all the rest don't. Any idea if the ownership of the image or the storage domain's directory have changed? 3. From the output of "id qemu" I can see that it doesn't belong to the "36(kvm)" group. Any idea if it was changed? Anyway, a workaround for this bug can be to add qemu to the kvm group. 4. Can you please provide the vdsm and engine logs from the time that disk was created? 5. Is this bug still reproducible in your system? Thanks!
1. ~$ ls -lZ /data/ovirt/data drwxr-xr-x vdsm qemu ? 7c5291d3-11e2-420f-99ad-47a376013671 -rwxr-xr-x vdsm qemu ? __DIRECT_IO_TEST__ 2. I don't think so. But it was a long time ago 3. Idem 4 and 5: I just added a new disks, it failed: -rw-rw---- 1 vdsm kvm 10G Jul 3 16:16 /data/ovirt/data/7c5291d3-11e2-420f-99ad-47a376013671/images/0243d40d-d1de-478f-93db-591d1955314c/b1f8aee3-c99f-4960-a32a-b942f8d9226b -rw-r--r-- 1 vdsm kvm 323 Jul 3 16:16 /data/ovirt/data/7c5291d3-11e2-420f-99ad-47a376013671/images/0243d40d-d1de-478f-93db-591d1955314c/b1f8aee3-c99f-4960-a32a-b942f8d9226b.meta
Created attachment 1293904 [details] requested logs
I tried the workaround, indeed it works.
Thanks Fabrice! To me it sounds like this can be the root cause for this bug. We need to find out why qemu was not added to the kvm group.
I think it might be my fault. I'm creating users using puppet. I thought I checked that they match exactly what oVirt created. So I probably made a mistake. All that is missing from a ovirt is a check and a slightly better log message.
OK, anyway I've just checked it with a fresh new VM with CentOS 7.3.1611, installed vdsm-4.18.21-1.el7.centos.x86_64 and got this result for 'id qemu': uid=107(qemu) gid=107(qemu) groups=107(qemu),11(cdrom),36(kvm) Therefore, in a clean system no bug should occur.