Bug 1274622 - getImagesList fails if called on a file based storageDomain which is not connected to any storage pool
getImagesList fails if called on a file based storageDomain which is not conn...
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Bindings-API (Show other bugs)
4.17.10
Unspecified Unspecified
high Severity high (vote)
: ovirt-4.0.0-alpha
: 4.17.999
Assigned To: Fred Rolland
Elad
:
: 1331503 (view as bug list)
Depends On: 1325657
Blocks: Gluster-HC-2
  Show dependency treegraph
 
Reported: 2015-10-23 03:13 EDT by Simone Tiraboschi
Modified: 2016-08-01 08:28 EDT (History)
14 users (show)

See Also:
Fixed In Version: ovirt 4.0.0 alpha1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-01 08:28:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.0+
rule-engine: planning_ack+
rule-engine: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)
vdsm.log storagedomaindoesnotexist (93.78 KB, application/x-xz)
2016-07-12 09:05 EDT, Carlos Mestre González
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47676 master ABANDONED storage: file: removing SP assumtion on getImagesList Never
oVirt gerrit 49684 None None None 2016-02-03 03:38 EST

  None (edit)
Description Simone Tiraboschi 2015-10-23 03:13:47 EDT
Description of problem:
If we call getImagesList on a file based storageDomain which is not connected to any storage storagePoll it fails with:
 {'status': {'message': 'list index out of range', 'code': 100}}
cause it assumes that one and only one SP is always there.

Version-Release number of selected component (if applicable):
4.17.10-2.gitdbbc5a4.el7

How reproducible:
100%

Steps to Reproduce:
1. call getImagesList on an unconnected file based storage domain
2.
3.

Actual results:
{'status': {'message': 'list index out of range', 'code': 100}}

Expected results:
The images list


Additional info:
Comment 1 Allon Mureinik 2015-10-26 09:27:17 EDT
Nir, can you review the attached patch please?
Comment 2 Yaniv Lavi (Dary) 2015-10-29 08:51:40 EDT
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.
Comment 3 Nir Soffer 2015-11-09 09:25:01 EST
In 3.6 you must be connected to the pool. We are not going to support snything else. This will possible in 4.0.
Comment 4 Red Hat Bugzilla Rules Engine 2015-11-17 04:26:28 EST
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.
Comment 5 Fred Rolland 2016-01-14 08:04:57 EST
Simone hi,

Can you test on master if this bug is still relevant ?

Adam changed the logic to not use the pool in this patch :

https://gerrit.ovirt.org/#/c/49684/3

Thanks
Comment 6 Simone Tiraboschi 2016-01-26 05:37:14 EST
It looks OK with vdsm.noarch 4.17.999-536.git433b527.el7.centos form master.
Comment 7 Simone Tiraboschi 2016-02-01 03:50:05 EST
Can you please backport path 49684 to 3.6 too?
Comment 8 Tal Nisan 2016-02-02 06:49:49 EST
Simone, this is not a question to Freddy but rather to VDSM maintainers and product managers, in general this late into 3.6 I think it's too risky, we can propose it to 3.6.z if you'd like
Comment 9 Simone Tiraboschi 2016-02-02 08:02:01 EST
Thanks Tal,
I'm asking this for bug 1303316.
Just to summarize the issue: hosted-engine-setup deploys the engine VM, only when the data-center goes up (it requires the user to manually add the first regular storage domain) after an hour it will create the OVF_STORE volumes so the issue is that hosted-engine-setup cannot know about the OVF_STORE uuid.

After a reboot, ovirt-ha-agent has to scan the OVF_STORE to get the latest configuration for the engine VM.
The issue here is that we are not calling prepareImage since ovirt-ha-agent doesn't know the OVF_STORE uuid and we cannot call getImagesList since we are not connected to any storage pool at that point.
In general it seams to, strangely, work also without the prepareImage on the OVF_STORE but on bug 1303316 we have a report about OVF_STORE being not accessible on FC since its LV is still down.

Is there another way to prepare all the images on a storage domain without knowing their UUID?
Comment 10 Tal Nisan 2016-02-02 08:05:48 EST
Not sure, Nir, any idea?
Comment 11 Allon Mureinik 2016-02-03 04:15:38 EST
(In reply to Tal Nisan from comment #8)
> Simone, this is not a question to Freddy but rather to VDSM maintainers and
> product managers, in general this late into 3.6 I think it's too risky, we
> can propose it to 3.6.z if you'd like
This patch is part of the storage domain manifest in 4.0.
It won't be backported.
Comment 12 Nir Soffer 2016-02-11 03:29:02 EST
(In reply to Allon Mureinik from comment #11)
> (In reply to Tal Nisan from comment #8)
> > Simone, this is not a question to Freddy but rather to VDSM maintainers and
> > product managers, in general this late into 3.6 I think it's too risky, we
> > can propose it to 3.6.z if you'd like
> This patch is part of the storage domain manifest in 4.0.
> It won't be backported.

It should be possible to backport this patch, the dependency on the pool is 
not needed in this code path.
Comment 13 Simone Tiraboschi 2016-02-23 12:12:20 EST
getImagesList seams to fails also on iSCSI on vdsm.noarch                       4.17.21-0.el7ev          @rhev-3.6.3-3

Thread-5472::DEBUG::2016-02-23 19:07:26,796::task::595::Storage.TaskManager.Task::(_updateState) Task=`d364b006-34f5-429d-95c7-a4725f449763`::moving from state init -> state preparing
Thread-5472::INFO::2016-02-23 19:07:26,797::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID={'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}, options=None)
Thread-5472::WARNING::2016-02-23 19:07:26,797::resourceManager::834::Storage.ResourceManager.Owner::(acquire) Unexpected exception caught while owner 'd364b006-34f5-429d-95c7-a4725f449763' tried to acquire 'Storage.{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}'
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 814, in acquire
    locktype, timeout)
  File "/usr/share/vdsm/storage/resourceManager.py", line 514, in acquireResource
    request = self.registerResource(namespace, name, lockType, callback)
  File "/usr/share/vdsm/storage/resourceManager.py", line 538, in registerResource
    if not self._resourceNameValidator.match(name):
TypeError: expected string or buffer
Thread-5472::ERROR::2016-02-23 19:07:26,797::task::866::Storage.TaskManager.Task::(_setError) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3307, in getImagesList
    vars.task.getSharedLock(STORAGE, sdUUID)
  File "/usr/share/vdsm/storage/task.py", line 1375, in getSharedLock
    timeout)
  File "/usr/share/vdsm/storage/resourceManager.py", line 835, in acquire
    raise se.ResourceException(fullName)
ResourceException: General Exception, UUID: "Storage.{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}"
Thread-5472::DEBUG::2016-02-23 19:07:26,797::task::885::Storage.TaskManager.Task::(_run) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Task._run: d364b006-34f5-429d-95c7-a4725f449763 ({'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']},) {} failed - stopping task
Thread-5472::DEBUG::2016-02-23 19:07:26,797::task::1246::Storage.TaskManager.Task::(stop) Task=`d364b006-34f5-429d-95c7-a4725f449763`::stopping in state preparing (force False)
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::993::Storage.TaskManager.Task::(_decref) Task=`d364b006-34f5-429d-95c7-a4725f449763`::ref 1 aborting True
Thread-5472::INFO::2016-02-23 19:07:26,798::task::1171::Storage.TaskManager.Task::(prepare) Task=`d364b006-34f5-429d-95c7-a4725f449763`::aborting: Task is aborted: u'General Exception, UUID: "Storage.{\'status\': {\'message\': \'OK\', \'code\': 0}, \'imageslist\': [\'2e9a5dc5-5c83-43bd-8f32-79825194c368\', \'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258\', \'9dae0125-7467-4a53-b975-f022e32d15ac\', \'7d8c0950-399c-4b0f-aa90-317381da1c6e\']}"' - code 100
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::1176::Storage.TaskManager.Task::(prepare) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Prepare: aborted: General Exception, UUID: "Storage.{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}"
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::993::Storage.TaskManager.Task::(_decref) Task=`d364b006-34f5-429d-95c7-a4725f449763`::ref 0 aborting True
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::928::Storage.TaskManager.Task::(_doAbort) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Task._doAbort: force False
Comment 14 Simone Tiraboschi 2016-02-23 12:29:28 EST
(In reply to Simone Tiraboschi from comment #13)
> getImagesList seams to fails also on iSCSI on vdsm.noarch                   
> 4.17.21-0.el7ev          @rhev-3.6.3-3
> 
> Thread-5472::DEBUG::2016-02-23
> 19:07:26,796::task::595::Storage.TaskManager.Task::(_updateState)
> Task=`d364b006-34f5-429d-95c7-a4725f449763`::moving from state init -> state
> preparing
> Thread-5472::INFO::2016-02-23
> 19:07:26,797::logUtils::48::dispatcher::(wrapper) Run and protect:
> getImagesList(sdUUID={'status': {'message': 'OK', 'code': 0}, 'imageslist':
> ['2e9a5dc5-5c83-43bd-8f32-79825194c368',
> 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258',
> '9dae0125-7467-4a53-b975-f022e32d15ac',
> '7d8c0950-399c-4b0f-aa90-317381da1c6e']}, options=None)

Sorry, found: it's just the result of a bad request. Stupid bug!
Comment 15 Simone Tiraboschi 2016-05-20 08:20:14 EDT
*** Bug 1331503 has been marked as a duplicate of this bug. ***
Comment 17 Carlos Mestre González 2016-07-12 07:19:33 EDT
Hi Fred,

Can I ask you about the way to test this? I'm using a deactivated storage domain (in ovirt) and test with vdsClient getImagesList but I'm getting a:

Storage domain does not exist: ('928a25e0-2489-4ea2-bc55-520b112c0920',)

should I test checking the logs when importing a domain (does it gets trigger the getImagesList in that case)?
Comment 18 Fred Rolland 2016-07-12 07:39:32 EDT
Hi,
Can you provide the vdsm log ?

Thanks,

Fred
Comment 19 Carlos Mestre González 2016-07-12 09:05 EDT
Created attachment 1178904 [details]
vdsm.log storagedomaindoesnotexist

After deactivate the domain I call vdsClient getImagesList and got the StorageDomainDoesNotExist exception:

Thread-90274::ERROR::2016-07-12 15:59:50,668::task::868::Storage.TaskManager.Task::(_setError) Task=`b70c9b2b-d341-42e5-a757-6932d248972b`::Unexpected error
Comment 20 Fred Rolland 2016-07-12 10:32:44 EDT
I think you should ask Simone on how to test this as it is a hosted engine flow.

The exception is not about the pool as the bug is about.
Comment 21 Carlos Mestre González 2016-07-12 10:58:30 EDT
Simone, can you provide the steps on HE to verify this?
Comment 22 Simone Tiraboschi 2016-07-18 04:37:42 EDT
(In reply to Carlos Mestre González from comment #21)
> Simone, can you provide the steps on HE to verify this?

1. Deploy hosted-engine (please try it once on NFS and once on iSCSI); add your first regular storage domain and ensure that the engine imports the hosted-engine storage domain 
2. set global maintenance mode with hosted-engine --set-maintenance --mode=global
3. Cleanly shutdown the engine with hosted-engine --vm-shutdown
4. Reboot the host
5. Ged the hosted-engine storage domain uuid from /etc/ovirt-hosted-engine/hosted-engine.conf (sdUUID) 
5. Run 'vdsClient -s 0 getImagesList <sdUUID>
Comment 23 Elad 2016-07-19 08:04:52 EDT
getImagesList returns the images list (4 in total) under the hosted_storage storage domain after performing the scenario described in comment #22.

Steps as described in comment #22, deployed twice, iSCSI and NFS (over a clean server - freshly installed OS before deployment).



[root@blond-vdsf ~]# vdsClient -s 0 getImagesList 028649d6-d048-453a-b7aa-e96881cf0443
62ee53ee-5220-4b62-8ad8-f6c536e18070
a400f71f-fd52-49ca-9efd-3e5345303675
fd64b0a4-a1d2-439d-8649-cf8d6e3765c6
9e02d366-c926-485d-9870-dacfafabf5bc


Verified using:

ovirt-vmconsole-host-1.0.3-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
ovirt-host-deploy-1.5.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-vmconsole-1.0.3-1.el7ev.noarch
libgovirt-0.3.3-1.el7_2.4.x86_64
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
vdsm-api-4.18.5.1-1.el7ev.noarch
vdsm-infra-4.18.5.1-1.el7ev.noarch
vdsm-xmlrpc-4.18.5.1-1.el7ev.noarch
vdsm-jsonrpc-4.18.5.1-1.el7ev.noarch
vdsm-4.18.5.1-1.el7ev.x86_64
vdsm-python-4.18.5.1-1.el7ev.noarch
vdsm-yajsonrpc-4.18.5.1-1.el7ev.noarch
vdsm-cli-4.18.5.1-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.5.1-1.el7ev.noarch
fence-agents-rhevm-4.0.11-27.el7_2.8.x86_64
rhevm-appliance-20160623.0-1.el7ev.noarch
libvirt-daemon-1.2.17-13.el7_2.5.x86_64

Note You need to log in before you can comment on or make changes to this bug.