Bug 1417165 - Unable to attach a disk to a VM (disk owned by kvm and not qemu?)
Summary: Unable to attach a disk to a VM (disk owned by kvm and not qemu?)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.18.21
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.1.4
: ---
Assignee: Idan Shaby
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-27 11:38 UTC by Fabrice Bacchella
Modified: 2017-07-04 12:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-04 12:50:55 UTC
oVirt Team: Storage
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
rpms versions (28.42 KB, text/plain)
2017-01-30 12:07 UTC, Fabrice Bacchella
no flags Details
output from dmesg (59.27 KB, text/plain)
2017-01-30 12:07 UTC, Fabrice Bacchella
no flags Details
libvirt logs for the VM failing to attach the disk (102.72 KB, text/plain)
2017-01-30 12:08 UTC, Fabrice Bacchella
no flags Details
journactl -b output (216.89 KB, text/x-vhdl)
2017-01-30 12:09 UTC, Fabrice Bacchella
no flags Details
/var/log/messages (83.15 KB, text/plain)
2017-01-30 12:09 UTC, Fabrice Bacchella
no flags Details
vdsm.log (4.89 MB, text/plain)
2017-01-30 12:10 UTC, Fabrice Bacchella
no flags Details
requested logs (1.75 MB, application/x-gzip)
2017-07-03 14:26 UTC, Fabrice Bacchella
no flags Details

Description Fabrice Bacchella 2017-01-27 11:38:21 UTC
I upgraded an host to the latest version of vdsm: vdsm-4.18.21-1.el7.centos.x86_64, on a CentOS Linux release 7.3.1611 (Core)

I then created a disk that I wanted to attach to a running vm, but il fails, with the message in /var/log/libvirt/qemu/<VMNAME>.log:

Could not open '/rhev/data-center/17434f4e-8d1a-4a88-ae39-d2ddd46b3b9b/7c5291d3-11e2-420f-99ad-47a376013671/images/dd6ebebe-b43f-4b9b-981a-e390d89cb36c/21dc1593-6629-47d8-a7eb-0aca060b7e26': Permission denied

I tried to have a look at the disks images, and got a strange result:

-rw-rw---- 1 vdsm qemu 1.0M May 18  2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/ids
-rw-rw---- 1 vdsm qemu  16M May 18  2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/inbox
-rw-rw---- 1 vdsm qemu 2.0M May 18  2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/leases
-rw-r--r-- 1 vdsm qemu  451 May 18  2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/metadata
-rw-rw---- 1 vdsm qemu  16M May 18  2016 7c5291d3-11e2-420f-99ad-47a376013671/dom_md/outbox
-rw-rw---- 1 vdsm qemu  30K Jan 27 12:33 7c5291d3-11e2-420f-99ad-47a376013671/images/3a00232b-c1f9-4b9b-910e-caf8b0321609/4f6d5c63-6a36-4356-832e-f52427d9512e
-rw-rw---- 1 vdsm qemu  32G Jan 27 12:34 7c5291d3-11e2-420f-99ad-47a376013671/images/465df4e9-3c62-4501-889f-cbab65ed0e0d/7a9b9033-f5f8-4eaa-ac94-6cc0c4ff6120
-rw-rw---- 1 vdsm qemu  32G Jan 27 12:34 7c5291d3-11e2-420f-99ad-47a376013671/images/b0f4c517-e492-409f-934f-1561281a242b/a3d60d8a-f89b-41dd-b519-fb652301b1f5
-rw-rw---- 1 vdsm qemu  16G Jan  9 17:25 7c5291d3-11e2-420f-99ad-47a376013671/images/baf01c4e-ede9-4e4e-a265-172695d81a83/4cdd72a7-b347-4479-accd-ab08d61552f9
-rw-rw---- 1 vdsm kvm  200G Jan 27 12:26 7c5291d3-11e2-420f-99ad-47a376013671/images/dd6ebebe-b43f-4b9b-981a-e390d89cb36c/21dc1593-6629-47d8-a7eb-0aca060b7e26
-rw-rw---- 1 vdsm qemu  30K Jan 27 12:33 7c5291d3-11e2-420f-99ad-47a376013671/images/ed18c515-09c9-4a71-af0a-7f0934193a65/b5e53c81-2279-4f2b-b282-69db430d36d4

The new disk is the 200G one, owned by kvm.

The failing disk indeed belongs to kvm instead of qemu. Any explanation for that ?

Comment 1 Dan Kenigsberg 2017-01-28 14:58:51 UTC
please provide {super,}vdsm.log and /var/log/messages

it seems the the udev rule did not kick in for this disk on time.

Comment 2 Fabrice Bacchella 2017-01-29 10:31:48 UTC
How can I send that privately ? /var/log/messages contains a lot of non-public informations

Comment 3 Dan Kenigsberg 2017-01-29 11:41:37 UTC
Even if you sent it to me in private, I would have to share it with others anyway in order to solve the issue. I don't have any trick but to sanitize your log with many `sed` lines.

BTW, which version of udev, systemd, libvirt do you have?

and what about the vdsm logs?

Comment 4 Fabrice Bacchella 2017-01-29 11:59:34 UTC
Those informations are not secret. I just don't want them to be published on a public facing site where any random crawler can get it.

It's running on an up-to-date fully patched CentOS Linux release 7.3.1611.

I will give you more details tomorrow.

Comment 5 Fabrice Bacchella 2017-01-30 12:07:13 UTC
Created attachment 1245867 [details]
rpms versions

Comment 6 Fabrice Bacchella 2017-01-30 12:07:53 UTC
Created attachment 1245868 [details]
output from dmesg

Comment 7 Fabrice Bacchella 2017-01-30 12:08:28 UTC
Created attachment 1245869 [details]
libvirt logs for the VM failing to attach the disk

Comment 8 Fabrice Bacchella 2017-01-30 12:09:00 UTC
Created attachment 1245870 [details]
journactl -b output

Comment 9 Fabrice Bacchella 2017-01-30 12:09:35 UTC
Created attachment 1245871 [details]
/var/log/messages

Comment 10 Fabrice Bacchella 2017-01-30 12:10:43 UTC
Created attachment 1245872 [details]
vdsm.log

Comment 11 Fabrice Bacchella 2017-01-30 12:14:55 UTC
Informations about the users in the hosts:

id vdsm
uid=36(vdsm) gid=36(kvm) groups=36(kvm),179(sanlock),107(qemu)

id qemu
uid=107(qemu) gid=107(qemu) groups=107(qemu),11(cdrom)

Comment 12 Tomas Jelinek 2017-02-01 11:35:42 UTC
moving to storage for further investigation

Comment 13 Tal Nisan 2017-02-01 13:44:08 UTC
Fabrice, the problematic disk is on a block domain or a file domain?

Comment 14 Fabrice Bacchella 2017-02-01 18:00:39 UTC
An export of the domain return:

<StorageDomain href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671" id="7c5291d3-11e2-420f-99ad-47a376013671">
    <actions>
        <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/isattached" rel="isattached"/>
        <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/updateovfstore" rel="updateovfstore"/>
        <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/refreshluns" rel="refreshluns"/>
    </actions>
    <name>XXX</name>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/permissions" rel="permissions"/>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/templates" rel="templates"/>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/vms" rel="vms"/>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/disks" rel="disks"/>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/storageconnections" rel="storageconnections"/>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/disksnapshots" rel="disksnapshots"/>
    <link href="/ovirt-engine/api/storagedomains/7c5291d3-11e2-420f-99ad-47a376013671/diskprofiles" rel="diskprofiles"/>
    <data_centers>
        <data_center id="17434f4e-8d1a-4a88-ae39-d2ddd46b3b9b"/>
    </data_centers>
    <type>data</type>
    <external_status>
        <state>ok</state>
    </external_status>
    <master>true</master>
    <storage>
        <type>localfs</type>
        <path>/data/ovirt/data</path>
    </storage>
    <available>1989643599872</available>
    <used>56908316672</used>
    <committed>300647710720</committed>
    <storage_format>v3</storage_format>
    <wipe_after_delete>false</wipe_after_delete>
    <warning_low_space_indicator>10</warning_low_space_indicator>
    <critical_space_action_blocker>5</critical_space_action_blocker>
</StorageDomain>

Is that what you need ?

Comment 15 Yaniv Lavi 2017-02-23 11:24:45 UTC
Moving out all non blocker\exceptions.

Comment 18 Idan Shaby 2017-07-02 12:39:57 UTC
Hi Fabrice,

A few questions and requests:

1. Can you please provide the output of "ls -lZ /data/ovirt/data"?
2. The right ownership of images and snapshots is vdsm:kvm.
It seems like the problematic disk has the right ownership and all the rest don't. Any idea if the ownership of the image or the storage domain's directory have changed?
3. From the output of "id qemu" I can see that it doesn't belong to the "36(kvm)" group. Any idea if it was changed?
Anyway, a workaround for this bug can be to add qemu to the kvm group.
4. Can you please provide the vdsm and engine logs from the time that disk was created?
5. Is this bug still reproducible in your system?

Thanks!

Comment 19 Fabrice Bacchella 2017-07-03 14:25:38 UTC
1.
~$ ls -lZ /data/ovirt/data
drwxr-xr-x vdsm qemu ?                                7c5291d3-11e2-420f-99ad-47a376013671
-rwxr-xr-x vdsm qemu ?                                __DIRECT_IO_TEST__
2. I don't think so. But it was a long time ago
3. Idem
4 and 5: I just added a new disks, it failed:
-rw-rw---- 1 vdsm kvm   10G Jul  3 16:16 /data/ovirt/data/7c5291d3-11e2-420f-99ad-47a376013671/images/0243d40d-d1de-478f-93db-591d1955314c/b1f8aee3-c99f-4960-a32a-b942f8d9226b
-rw-r--r-- 1 vdsm kvm   323 Jul  3 16:16 /data/ovirt/data/7c5291d3-11e2-420f-99ad-47a376013671/images/0243d40d-d1de-478f-93db-591d1955314c/b1f8aee3-c99f-4960-a32a-b942f8d9226b.meta

Comment 20 Fabrice Bacchella 2017-07-03 14:26:20 UTC
Created attachment 1293904 [details]
requested logs

Comment 21 Fabrice Bacchella 2017-07-03 14:44:21 UTC
I tried the workaround, indeed it works.

Comment 22 Idan Shaby 2017-07-04 07:22:44 UTC
Thanks Fabrice!
To me it sounds like this can be the root cause for this bug.
We need to find out why qemu was not added to the kvm group.

Comment 23 Fabrice Bacchella 2017-07-04 12:50:55 UTC
I think it might be my fault. I'm creating users using puppet. I thought I checked that they  match exactly what oVirt created. So I probably made a mistake.
All that is missing from a ovirt is a check and a slightly better log message.

Comment 24 Idan Shaby 2017-07-04 12:56:47 UTC
OK, anyway I've just checked it with a fresh new VM with CentOS 7.3.1611, installed vdsm-4.18.21-1.el7.centos.x86_64 and got this result for 'id qemu':
uid=107(qemu) gid=107(qemu) groups=107(qemu),11(cdrom),36(kvm)

Therefore, in a clean system no bug should occur.


Note You need to log in before you can comment on or make changes to this bug.