Bug 1477600 - [REST] diskattachment doesn't always show the logical_name value
[REST] diskattachment doesn't always show the logical_name value
Status: CLOSED WORKSFORME
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt (Show other bugs)
4.1.4.2
Unspecified Unspecified
high Severity high (vote)
: ovirt-4.1.8
: ---
Assigned To: Arik
Lilach Zitnitski
: Automation, Regression
: 1465488 (view as bug list)
Depends On:
Blocks: 1518209
  Show dependency treegraph
 
Reported: 2017-08-02 08:44 EDT by Lilach Zitnitski
Modified: 2017-11-28 07:21 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1518209 (view as bug list)
Environment:
Last Closed: 2017-11-21 09:48:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
rule-engine: blocker+


Attachments (Terms of Use)
logs (406.68 KB, application/zip)
2017-08-02 08:44 EDT, Lilach Zitnitski
no flags Details
vdsm logs (135.45 KB, application/x-gzip)
2017-08-30 03:49 EDT, Raz Tamir
no flags Details
ovirt-guest-agent log (22.92 KB, text/plain)
2017-08-30 08:15 EDT, Raz Tamir
no flags Details

  None (edit)
Description Lilach Zitnitski 2017-08-02 08:44:17 EDT
Description of problem:
/api/vms/.../diskattachments should have for each disk its logical name as appears in the vm, but sometimes this value is missing from the diskattachments 

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.4.2-0.1.el7.noarch
vdsm-4.19.23-1.el7ev.x86_64

How reproducible:
~50%

Steps to Reproduce:
1. create vm with disks
2. start the vm
3. go to /api/vms/vm_id/diskattachments

Actual results:
sometimes the logical name of the disk is missing 

Expected results:
the logical name should be under diskattachments 

Additional info:
I saw it mainly in the automation tests and this is the respond to GET diskattachments

<disk_attachments>
    <disk_attachment href="/ovirt-engine/api/vms/df490cd2-e19c-4c20-a5a2-5a99968285e9/diskattachments/529f6ed2-adda-48e4-aac0-0c92259e1bf1" id="529f6ed2-adda-48e4-aac0-0c92259e1bf1"
>
        <active>true</active>
        <bootable>true</bootable>
        <interface>virtio</interface>
        <pass_discard>false</pass_discard>
        <read_only>false</read_only>
        <uses_scsi_reservation>false</uses_scsi_reservation>
        <disk href="/ovirt-engine/api/disks/529f6ed2-adda-48e4-aac0-0c92259e1bf1" id="529f6ed2-adda-48e4-aac0-0c92259e1bf1"/>
        <vm href="/ovirt-engine/api/vms/df490cd2-e19c-4c20-a5a2-5a99968285e9" id="df490cd2-e19c-4c20-a5a2-5a99968285e9"/>
    </disk_attachment>

also, the VM was started at 11:52 and at 11:59 the logical_name still didn't show up.
Comment 1 Lilach Zitnitski 2017-08-02 08:44 EDT
Created attachment 1308205 [details]
logs
Comment 2 Allon Mureinik 2017-08-02 11:49:07 EDT
Lilach, can you please explain the difference between this bug and bug 1465488? Isn't bug 1465488 just a specific case (direct lun) of this general case (any disk)?
Comment 3 Lilach Zitnitski 2017-08-03 03:05:46 EDT
(In reply to Allon Mureinik from comment #2)
> Lilach, can you please explain the difference between this bug and bug
> 1465488? Isn't bug 1465488 just a specific case (direct lun) of this general
> case (any disk)?

Yes I guess, I thought is should be in different bug because it's not always repeating itself. 
I'll add this bug's description in the first bug as a comment.
Comment 4 Raz Tamir 2017-08-06 15:19:37 EDT
Raising severity as this fails random cases in our automation
Comment 5 Tal Nisan 2017-08-29 11:43:47 EDT
Tried to reproduce, it takes time indeed until the logical names data is populated in the engine but this info is coming from the guest agent, from the storage aspect we have not much to do about it, data comes in from the guest agent, stored in the DB and displayed though the REST.
Moving to guest agent to investigate their aspect of the bug.
Comment 6 Raz Tamir 2017-08-30 03:49 EDT
Created attachment 1319890 [details]
vdsm logs
Comment 7 Yaniv Kaul 2017-08-30 04:01:18 EDT
Can you attach guest agent logs, to ensure it sees the disks?
If we add a minute to the timeout, does it eliminate the issue? 5 minutes? (only if you fail of course)?
Comment 8 Tomáš Golembiovský 2017-08-30 07:04:10 EDT
(In reply to Yaniv Kaul from comment #7)
> Can you attach guest agent logs, to ensure it sees the disks?

You'll have to enable debug output to actually see anything useful.
I don't see any communication between VDSM and GA in the log vdsm.log so GA log would definitely help. 

> If we add a minute to the timeout, does it eliminate the issue? 5 minutes?
> (only if you fail of course)?

You can also try changing the refresh timeout in GA configuration to see if it
helps. In /etc/ovirt-guest-agent.conf change 'report_disk_usage' in section
'[general]'. The default is 300 seconds (5 minutes).
Comment 9 Raz Tamir 2017-08-30 08:15:07 EDT
ovirt guset agent logs attached.

in the logs I can see that the VM see the device:
Dummy-1::DEBUG::2017-08-30 15:09:14,741::VirtIoChannel::209::root::Written {"__name__": "disks-usage", "disks": [{"path": "/", "total": 8914993152, "used": 1280544768, "fs": "xfs"}, {"path": "/boot", "total": 70566
7072, "used": 150638592, "fs": "ext3"}], "mapping": {"b54eb41c-496b-4270-a": {"name": "/dev/vda"}, "QEMU_DVD-ROM_QM00003": {"name": "/dev/sr0"}}}

Also, from inside the guest:
#lsblk
NAME                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0                  11:0    1 1024M  0 rom  
vda                 252:0    0   10G  0 disk 
├─vda1              252:1    0  700M  0 part /boot
├─vda2              252:2    0    1G  0 part [SWAP]
└─vda3              252:3    0  8.3G  0 part 
  └─VolGroup01-root 253:0    0  8.3G  0 lvm  /

The output of GET command to /api/vms/VM_ID/diskattachments:
<disk_attachments>
<disk_attachment href="/ovirt-engine/api/vms/df70f7ae-3071-4bc3-956d-ecfee93385ad/diskattachments/b54eb41c-496b-4270-a834-26476c884411" id="b54eb41c-496b-4270-a834-26476c884411">
<active>true</active>
<bootable>true</bootable>
<interface>virtio</interface>
<pass_discard>false</pass_discard>
<read_only>false</read_only>
<uses_scsi_reservation>false</uses_scsi_reservation>
<disk href="/ovirt-engine/api/disks/b54eb41c-496b-4270-a834-26476c884411" id="b54eb41c-496b-4270-a834-26476c884411"/>
<vm href="/ovirt-engine/api/vms/df70f7ae-3071-4bc3-956d-ecfee93385ad" id="df70f7ae-3071-4bc3-956d-ecfee93385ad"/>
</disk_attachment>
</disk_attachments>

The VM is running for more than 30 minutes now
Comment 10 Raz Tamir 2017-08-30 08:15 EDT
Created attachment 1320054 [details]
ovirt-guest-agent log
Comment 11 Raz Tamir 2017-08-30 09:49:45 EDT
As requested by Tomáš I executed 'vdsm-client Host getVMFullList' on the host that running the VM:
...
        "vmId": "df70f7ae-3071-4bc3-956d-ecfee93385ad", 
        "guestDiskMapping": {
            "b54eb41c-496b-4270-a": {
                "name": "/dev/vda"
            }, 
            "QEMU_DVD-ROM_QM00003": {
                "name": "/dev/sr0"
            }
...
Comment 12 Tomáš Golembiovský 2017-08-30 10:04:19 EDT
Aparently GA works properly and the disk mapping is returned to the VDSM. This seems to me more like an engine bug.
Comment 13 Tal Nisan 2017-09-04 04:34:35 EDT
(In reply to Raz Tamir from comment #11)
> As requested by Tomáš I executed 'vdsm-client Host getVMFullList' on the
> host that running the VM:
> ...
>         "vmId": "df70f7ae-3071-4bc3-956d-ecfee93385ad", 
>         "guestDiskMapping": {
>             "b54eb41c-496b-4270-a": {
>                 "name": "/dev/vda"
>             }, 
>             "QEMU_DVD-ROM_QM00003": {
>                 "name": "/dev/sr0"
>             }
> ...

Raz, can you check if after getting the correct result 'select device_id, logical_name from vm_device where vm_id=VM_GUID;' will show the correct logical names?
Comment 14 Raz Tamir 2017-09-04 07:32:44 EDT
No,

The logical_name column is empty:
engine=# select device_id, logical_name from vm_device where vm_id='df70f7ae-3071-4bc3-956d-ecfee93385ad';
              device_id               | logical_name 
--------------------------------------+--------------
 0e471884-09a8-4d1c-9997-12b1d220df16 | 
 94757b8e-8050-423a-814b-ada369012a30 | 
 c06b62b0-dbd3-4e3d-a2b0-bdcaee9cfce4 | 
 224ca422-181d-4b47-a601-8d6587eb9ce6 | 
 24c1f41e-00ad-4806-8900-566f75f9b9eb | 
 329ad607-f890-4886-9e88-c254444d58a6 | 
 52b17e05-f1bd-46e3-b716-15a6a07196b9 | 
 73fc3277-d973-4657-ae45-72f4824667db | 
 8a315737-74a1-43c9-9ae0-a25436d20f9f | 
 a62a7bf9-7c03-45a2-9790-058836dd4ce3 | 
 a8e5ec7c-a1f5-4ad4-a64b-85b64fef1575 | 
 b54eb41c-496b-4270-a834-26476c884411 | 
 d14045a4-2dfa-475f-99d9-1a42683c03b3 | 
 edfbde30-4f7a-431b-8248-d7344f0d04ee | 
 0059b1cf-3ef8-49b0-9387-c8d7bf93164d | 
(15 rows)
Comment 15 Tal Nisan 2017-09-04 08:35:22 EDT
Seems like VDSM doesn't report it to Engine then, Tomas any idea why?
Comment 16 Michal Skrivanek 2017-09-04 08:38:22 EDT
no, it is reported by VDSM, as checked in comment #11, it's not properly processed on engine side it seems
Comment 17 Tal Nisan 2017-09-04 08:52:05 EDT
VDSM knows about it yes, question is if the stats are passed correctly to the Engine or the Engine doesn't process them right, either way my question is why
Comment 18 Tomáš Golembiovský 2017-09-12 15:41:05 EDT
*** Bug 1465488 has been marked as a duplicate of this bug. ***
Comment 19 Michal Skrivanek 2017-09-12 17:01:57 EDT
I do not understand, is the problem about a delay in reporting or that some disk logical names for some VMs are never populated, despite being seen reported by vdsm at that time?
The matching may be wrong, but the vdsm logs are missing creation xml.

In either case, please add timelines, especially when you try to compare data from different sources
Comment 20 Raz Tamir 2017-09-13 08:30:36 EDT
(In reply to Michal Skrivanek from comment #19)
> I do not understand, is the problem about a delay in reporting or that some
> disk logical names for some VMs are never populated, despite being seen
> reported by vdsm at that time?
The disk logical name is never presented in REST API
> The matching may be wrong, but the vdsm logs are missing creation xml.
> 
> In either case, please add timelines, especially when you try to compare
> data from different sources

The logical device name didn't show even after 48 hours so I'm not sure what timeline are you expecting?
Comment 21 Michal Skrivanek 2017-09-14 06:02:39 EDT
(In reply to Raz Tamir from comment #20)
> (In reply to Michal Skrivanek from comment #19)
> > I do not understand, is the problem about a delay in reporting or that some
> > disk logical names for some VMs are never populated, despite being seen
> > reported by vdsm at that time?
> The disk logical name is never presented in REST API
> > The matching may be wrong, but the vdsm logs are missing creation xml.
> > 
> > In either case, please add timelines, especially when you try to compare
> > data from different sources
> 
> The logical device name didn't show even after 48 hours

good. yeah, minutes would be the worst case. Anything more than that qualifies as "never":) So the problem is that it never appears. Thanks

> so I'm not sure what timeline are you expecting?
here I expect a more concrete timeline of actions you describe, but since the above answer is "never" it doesn't matter much anymore.
Then we only miss the vdsm.log containing creation of that VM to verify the device id
Comment 22 Arik 2017-09-14 08:05:50 EDT
I suspect that the engine didn't poll the logical name from VDSM because the computed hash of the devices didn't change. Could you please reproduce this state and then add a NIC to the running VM and see if the logical name is updated?
Comment 23 Raz Tamir 2017-09-17 04:17:54 EDT
Arik,

I hotplugged disk and nic and nothing changed - I still cannot see the disk logical name in rest api bug the guest sees the new devices
Comment 24 Arik 2017-09-17 08:54:18 EDT
(In reply to Raz Tamir from comment #23)
And is it still missing in the database (using the query used in comment 14)?
Comment 25 Raz Tamir 2017-09-17 12:00:20 EDT
Yes, it is still missing from the DB:

engine=# select device_id, logical_name from vm_device where vm_id='ddd3a804-e3dc-4710-9769-8db4b5e18157';
              device_id               | logical_name 
--------------------------------------+--------------
 7387ead8-dba3-4e65-a7f6-3fc79ff860f2 | 
 7858e20a-999c-4b92-ba37-6a0e0719b3f7 | 
 683c0e36-c6cb-48bc-9262-bb979f6ba0b9 | 
 751aeb96-2810-490e-9a69-e3b8332aebb8 | 
 7a132485-c67f-4990-bafa-56e7f205fd01 | 
 7c7ca816-ca2a-4a61-83fe-336c4508c526 | 
 9a7cf75c-6acf-455b-b978-1ee23ece5c05 | 
 abff4c60-8464-4250-8545-029dc9c9035d | 
 c01324c8-72f4-4097-b2ae-029ca0d38345 | 
 c1ba522b-4923-49d1-b0c3-0436e0f48214 | 
 e7b790f3-a7c6-4945-abaa-8931be49a05e | 
 ed714e68-8d6f-47c8-bf05-2c13bc5338ba | 
 f8dfdf65-289a-4cd4-9ee9-d3f868844001 | 
 06b04bd5-b3fa-4b95-8f71-9c5168e08c4d | 
 3ed7d3e5-579a-4299-a834-a9409064b72b | 
 47151e0c-8652-4873-a3eb-37de3eb15756 | 
 50f41895-e719-454d-84c1-beec0581f1f8 | 
(17 rows)
Comment 26 Arik 2017-11-16 09:17:16 EST
Works for me with latest 4.1 branch (now 4.1.8) and on master (4.2).
Please try to reproduce with 4.1.8.
Comment 27 Raz Tamir 2017-11-16 09:37:03 EST
Just tested on latest 4.2 and it doesn't work
Comment 28 Arik 2017-11-16 09:58:21 EST
(In reply to Raz Tamir from comment #27)
> Just tested on latest 4.2 and it doesn't work

Did you try with cluster 4.1? We have a known issue with the guest agent on cluster compatibility version 4.2
Comment 29 Raz Tamir 2017-11-19 06:34:04 EST
Tested also on ovirt-engine-4.1.8-0.1.el7.noarch and the issue still exist
Comment 30 Arik 2017-11-19 07:39:30 EST
It still works properly on my setup with latest 4.1 branch:

engine41=> select device, type, device_id, logical_name from vm_device where vm_id in (select vm_guid from vm_static where vm_name='fedora');
    device     |    type    |              device_id               | logical_name 
---------------+------------+--------------------------------------+--------------
 memballoon    | balloon    | 82fb62e9-7cf2-4fc8-8b2f-233ac02dd66d | 
 cdrom         | disk       | b64336a3-c918-4f07-b960-57b78040f862 | 
 usb           | controller | ddbd15f4-3eba-417f-b0fa-ffc783e6ffbd | 
 virtio-scsi   | controller | e7b9bc07-bb14-4ea5-8a19-c114792a037e | 
 virtio        | rng        | ea86f5c0-4395-4578-be10-e4d77cd39843 | 
 qxl           | video      | 12347e47-945d-4fbf-a9d1-f32d9f7734f3 | 
 virtio-serial | controller | 21b60702-0b67-4723-b4ca-9e37798154ed | 
 disk          | disk       | 28d848d6-e488-4474-b033-0263167f3344 | /dev/vda
 bridge        | interface  | 60c7c642-3754-41bb-9775-e1e50dd51779 | 
 ide           | controller | 0aff370a-8223-4555-97d7-3c5db6f5825f | 
 spice         | graphics   | 65258169-cd36-43e1-9365-2c4784c39110 | 
 unix          | channel    | 404dfeda-8753-473d-be72-71d3f90514f4 | 
 unix          | channel    | 51e17a92-6adf-4408-942f-a2223e5d3eb7 | 
 spicevmc      | channel    | d26c1913-2430-4100-beb5-997c3f34541e |

I suspect the issue the QE are facing is somehow related to timeouts or changes in the devices that are not properly identified. However, it is difficult to investigate it with the provided logs.

We need an access to an environment where this issue reproduces or be provided with hints on how to reproduce it on our environments, otherwise there is not much we can do.
Comment 31 Arik 2017-11-19 07:51:18 EST
And the output of disk-attachments in REST-API on my 4.1 environment:

<disk_attachments>
<disk_attachment href= "/ovirt-engine/api/vms/bde20767-d65c-4ab2-a7aa-250e83fc0100/diskattachments/28d848d6-e488-4474-b033-0263167f3344" id="28d848d6-e488-4474-b033-0263167f3344">
<active>true</active>
<bootable>true</bootable>
<interface>virtio</interface>
<logical_name>/dev/vda</logical_name>
<pass_discard>false</pass_discard>
<read_only>false</read_only>
<uses_scsi_reservation>false</uses_scsi_reservation>
<disk href= "/ovirt-engine/api/disks/28d848d6-e488-4474-b033-0263167f3344" id="28d848d6-e488-4474-b033-0263167f3344"/>
<vm href= "/ovirt-engine/api/vms/bde20767-d65c-4ab2-a7aa-250e83fc0100" id="bde20767-d65c-4ab2-a7aa-250e83fc0100"/>
</disk_attachment>
</disk_attachments>
Comment 32 Raz Tamir 2017-11-19 07:53:24 EST
What version of ovirt-guest-agent are you using?
Comment 33 Arik 2017-11-19 08:01:16 EST
(In reply to Raz Tamir from comment #32)
> What version of ovirt-guest-agent are you using?

Version     : 1.0.13
Release     : 2.fc24

And VDSM reports the following mappings:
"guestDiskMapping": {
            "QEMU_DVD-ROM_QM00003": {
                "name": "/dev/sr0"
            }, 
            "28d848d6-e488-4474-b": {
                "name": "/dev/vda"
            }
        }, 

Which is similar to those you mentioned in comment 11, so I don't think the problem is with the guest-agent or with VDSM in this case.
Comment 34 Lilach Zitnitski 2017-11-19 08:53:48 EST
Arik, I ran one of out automation tests now, and the results are -

engine=# select device, logical_name from vm_device where vm_id='fc36af68-bef4-441b-9b8c-1cf2640f55fb';
    device     | logical_name
---------------+--------------
 spice         |
 bridge        |
 virtio-scsi   |
 disk          |
 ich6          |
 qxl           |
 virtio-serial |
 usb           |
 memballoon    |
 virtio        |
 cdrom         |
 spicevmc      |
 ide           |
 unix          |
 unix          |

<disk_attachments>
<disk_attachment href= "/ovirt-engine/api/vms/fc36af68-bef4-441b-9b8c-1cf2640f55fb/diskattachments/66427294-7482-4bd3-9b9f-2aa46da0193c" id="66427294-7482-4bd3-9b9f-2aa46da0193c">
<active>true</active>
<bootable>true</bootable>
<interface>virtio</interface>
<pass_discard>false</pass_discard>
<read_only>false</read_only>
<uses_scsi_reservation>false</uses_scsi_reservation>
<disk href= "/ovirt-engine/api/disks/66427294-7482-4bd3-9b9f-2aa46da0193c" id="66427294-7482-4bd3-9b9f-2aa46da0193c"/>
<vm href= "/ovirt-engine/api/vms/fc36af68-bef4-441b-9b8c-1cf2640f55fb" id="fc36af68-bef4-441b-9b8c-1cf2640f55fb"/>
</disk_attachment>
</disk_attachments>

No logical name is shown.
My engine is 4.2 (ovirt-engine-4.2.0-0.0.master.20171116212005.git61ffb5f.el7.centos.noarch) 

If you need let me know and I'll give you access to my environment even while running the test so you can see it live.
Comment 35 Arik 2017-11-19 09:00:41 EST
(In reply to Lilach Zitnitski from comment #34)
Lilach, not receiving information from the guest-agent in engine 4.2 is something we aware of and working on fixing. You don't get IPs as well, right?

I would be more interested in looking into engine 4.1.8, because there it is supposed to work.
Comment 36 Arik 2017-11-19 09:01:10 EST
(In reply to Arik from comment #35)
> (In reply to Lilach Zitnitski from comment #34)
> Lilach, not receiving information from the guest-agent in engine 4.2 is
> something we aware of and working on fixing. You don't get IPs as well,
> right?
> 
> I would be more interested in looking into engine 4.1.8, because there it is
> supposed to work.

Unless the cluster version is 4.1...
Comment 37 Arik 2017-11-19 09:26:45 EST
Note that the fix for getting guest-agent's data in 4.2 is not available in VDSM 4.20.7-34.
Comment 38 Lilach Zitnitski 2017-11-19 09:32:42 EST
(In reply to Arik from comment #35)
> (In reply to Lilach Zitnitski from comment #34)
> Lilach, not receiving information from the guest-agent in engine 4.2 is
> something we aware of and working on fixing. You don't get IPs as well,
> right?
> 

The IP bug is already fixed in this version. 

> I would be more interested in looking into engine 4.1.8, because there it is
> supposed to work.
Comment 39 Arik 2017-11-21 09:48:56 EST
So I was unable to reproduce it on my environment and we saw on Lilach's environment that the disk attachments of lun disks were missing but others were reported ok, which seems like a different issue (unrelated to virt monitoring) that is also not necessarily a regression. Without being able to reproduce it, there's not much we can do.

Note You need to log in before you can comment on or make changes to this bug.