Bug 1320112 - Call to getImagesList on NFS on host without connected storage pool but with SD, return {'status': {'message': 'OK', 'code': 0}, 'imageslist': []} also when we have images
Summary: Call to getImagesList on NFS on host without connected storage pool but with ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.17.23.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.1.2
: ---
Assignee: Idan Shaby
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-22 10:34 UTC by Yaniv Lavi
Modified: 2017-03-08 23:00 UTC (History)
12 users (show)

Fixed In Version:
Clone Of: 1319721
Environment:
Last Closed: 2017-03-06 13:39:23 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-4.1?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 54982 0 None None None 2016-03-22 10:34:41 UTC
oVirt gerrit 55001 0 None None None 2016-03-22 10:34:41 UTC

Description Yaniv Lavi 2016-03-22 10:34:41 UTC
Opening to track storage review of the fix and proper solution.

+++ This bug was initially created as a clone of Bug #1319721 +++

Description of problem:
Call to getImagesList on host without connected storage pool, but with SD, return {'status': {'message': 'OK', 'code': 0}

Version-Release number of selected component (if applicable):
vdsm-4.17.23.1-0.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. run script
sdUUID = 'SD_UUID'
 cli = vdscli.connect(timeout=60)
 result = cli.getImagesList(sdUUID)
 print(result)
 result = cli.getConnectedStoragePoolsList()
 print(result)
2.
3.

Actual results:
That prints:
 {'status': {'message': 'OK', 'code': 0}, 'imageslist': []}
 {'status': {'message': 'OK', 'code': 0}, 'poollist': []}

Expected results:
result = cli.getImagesList(sdUUID)
print(result)
{'status': {'message': 'list index out of range', 'code': 100}}

Additional info:
the image are available on the NFS share 

[root@alma07 images]# pwd
/rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_alukiano__HE__upgrade/3ac831d6-6124-4b42-a060-f89c64be09a1/images
[root@alma07 images]# ls -l
total 16
drwxr-xr-x. 2 vdsm kvm 4096 17 mar 15.36 4d1915e1-a9f7-4bca-b666-0997adec5ef4
drwxr-xr-x. 2 vdsm kvm 4096 18 mar 00.52 995171f0-1abb-488b-9b18-3e17aad0c3de
drwxr-xr-x. 2 vdsm kvm 4096 20 mar 15.40 9ecd4e5f-bb24-4fd6-8c20-c442425b59b6
drwxr-xr-x. 2 vdsm kvm 4096 18 mar 00.52 b6b637a4-37be-48e9-aacb-e3d4a6be29cc

but VDSM is not reporting them.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-03-21 14:12:27 IST ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

--- Additional comment from Simone Tiraboschi on 2016-03-21 15:03 IST ---



--- Additional comment from Allon Mureinik on 2016-03-21 15:05:34 IST ---

I don't understand. Isn't this just a duplicate of bug 1274622 ?

--- Additional comment from Simone Tiraboschi on 2016-03-21 15:11:12 IST ---

We can have an additional simple workaround on hosted-engine side (as for https://gerrit.ovirt.org/#/c/54982/ ) but, if possible, I'd prefer to get this properly fixed on VDSM side to prevent any other surprise on other behavior changes.

--- Additional comment from Simone Tiraboschi on 2016-03-21 15:17:04 IST ---

(In reply to Allon Mureinik from comment #3)
> I don't understand. Isn't this just a duplicate of bug 1274622 ?

No, unfortunately it's not: at 1274622 that call was failing with:
 {'status': {'message': 'list index out of range', 'code': 100}}
and we implemented a workaround for that directly globing under /rhev/data-center/mnt/{mnt_point}/{sduuid}/images to directly find the images on the NFS share.

Now it's also worst since VDSM returns 
 {'status': {'message': 'OK', 'code': 0}, 'imageslist': []} (which is wrong since the images are there!) and so our workaround doesn't trigger anymore.
We can add an additional workaround over that but if possible I'd prefer to get it properly fixed.

--- Additional comment from Sandro Bonazzola on 2016-03-21 15:23:08 IST ---

Workaround has been posted here: https://gerrit.ovirt.org/#/c/54982/1

--- Additional comment from Sandro Bonazzola on 2016-03-21 15:31:24 IST ---

Moving this bug to ovirt-hosted-engine-ha since it has been decided to use the workaround instead of a proper fix in vdsm. Allon, please consider to schedule a proper fix for 3.6.5.

--- Additional comment from Allon Mureinik on 2016-03-21 17:25:59 IST ---

(In reply to Sandro Bonazzola from comment #7)
> Moving this bug to ovirt-hosted-engine-ha since it has been decided to use
> the workaround instead of a proper fix in vdsm. Allon, please consider to
> schedule a proper fix for 3.6.5.
Please clone it to track that initiative, although, atm, I can't commit to such a fix.

Comment 1 Red Hat Bugzilla Rules Engine 2016-03-22 10:35:48 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Sandro Bonazzola 2016-05-02 10:08:55 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 3 Yaniv Lavi 2016-05-23 13:24:24 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 4 Yaniv Lavi 2016-05-23 13:25:49 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 5 Tal Nisan 2016-12-21 13:46:33 UTC
Raz, I see that the patches are merged, can you try and reproduce and let us know if it's still an open issue?

Comment 6 Lilach Zitnitski 2016-12-22 12:32:59 UTC
(In reply to Tal Nisan from comment #5)
> Raz, I see that the patches are merged, can you try and reproduce and let us
> know if it's still an open issue?


cli.getConnectedStoragePoolsList() returns:
{'status': {'message': 'OK', 'code': 0}, 'poollist': []}
cli.getStorageDomainsList() returns:
{'status': {'message': 'OK', 'code': 0}, 'domlist': ['3f433c06-5057-4a13-afc7-8d70953e34b5']}
and the cli.getImagesList('3f433c06-5057-4a13-afc7-8d70953e34b5') returns:
{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['e8b3eb91-3892-4e74-8962-b73405295fbe', '25c6e941-12ed-4ea7-a334-b71fed8ab0e7', 'a535c196-eeda-4f3f-8593-1380bd8a7cb1', '374c37d9-2387-48c6-86ae-4ef3d64c6ee8']}

I'm not sure if that's the expected results, because I couldn't really understand why the expected results is 'list index out of range'. 

vdsm version:
vdsm-4.18.999-1216.git34aa313.el7.centos.x86_64

Comment 7 Raz Tamir 2016-12-27 07:01:39 UTC
Tal,
Please reply to comment #6

Comment 8 Yaniv Lavi 2016-12-28 09:01:25 UTC
the patches attached are workarounds to the storage function issue. We want a solution that HE will not need to workaround anymore.

Comment 9 Idan Shaby 2017-02-02 17:23:21 UTC
I don't understand the flow here.

"Call to getImagesList on host without connected storage pool, but with SD" - what is a host without a storage pool, but with a storage domain?
Is it a host in status = maintenance and with a storage domain that its status is up?
Is it an activated host with a detached storage domain?

Can you please explain in more details what's the flow here, who calls getImages and when?

Thanks!

Comment 10 Artyom 2017-02-05 08:45:34 UTC
I believe we talk about the scenario when we did HE deployment but still did not add the master storage domain to the engine.
I do not sure if the bug still relevant for the 4.1, so maybe Simone can clarify the situation.

Comment 11 Simone Tiraboschi 2017-02-08 11:40:48 UTC
(In reply to Artyom from comment #10)
> I believe we talk about the scenario when we did HE deployment but still did
> not add the master storage domain to the engine.
> I do not sure if the bug still relevant for the 4.1, so maybe Simone can
> clarify the situation.

Yes, correct, still worth to check it.

Comment 12 Yaniv Lavi 2017-02-23 11:25:47 UTC
Moving out all non blocker\exceptions.

Comment 13 Nir Soffer 2017-03-05 10:23:05 UTC
We do not support anything when you are not connected to storage pool, except
starting/stopping monitoring on external domains.

This sounds like RFE for future version, and does not fit bug fix for 4.1.

For 4.1 we should accept now only critical bug fixes in exiting features, not
add features we do not have. Anything else is risking the stability of 4.1.

Comment 14 Idan Shaby 2017-03-06 13:39:23 UTC
I installed HE without adding a storage domain.
I ran "vdsm-client StorageDomain getImages storagedomainID=<uuid>" with the NFS domain id that I created during the installation, and I got 4 images:

[root@localhost data-center]# vdsm-client StorageDomain getImages storagedomainID=74683202-0d17-4992-86a0-4f865d59f4cd
[
    "9dd2b426-87f0-44b7-b5f8-48f3ed3c266f",
    "fa67d28c-44d8-4e3d-b435-26935ead0d61",
    "80bfe663-7b94-49ee-8a63-42a25fb78272",
    "ddebc6b7-57c8-4978-aa94-4a48b6c1faad"
]

These are config and regular images of hosted engine.

Closing as works for me, as it seems to be the right behavior.


Note You need to log in before you can comment on or make changes to this bug.