Bug 1274622

Summary: getImagesList fails if called on a file based storageDomain which is not connected to any storage pool
Product: [oVirt] vdsm Reporter: Simone Tiraboschi <stirabos>
Component: Bindings-APIAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 4.17.10CC: acanan, amureini, bugs, cmestreg, dfediuck, ebenahar, frolland, gklein, nsoffer, sasundar, sbonazzo, stirabos, tnisan, ylavi
Target Milestone: ovirt-4.0.0-alphaFlags: rule-engine: ovirt-4.0.0+
rule-engine: planning_ack+
rule-engine: devel_ack+
acanan: testing_ack+
Target Release: 4.17.999   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt 4.0.0 alpha1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-01 12:28:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1325657    
Bug Blocks: 1277939    
Attachments:
Description Flags
vdsm.log storagedomaindoesnotexist none

Description Simone Tiraboschi 2015-10-23 07:13:47 UTC
Description of problem:
If we call getImagesList on a file based storageDomain which is not connected to any storage storagePoll it fails with:
 {'status': {'message': 'list index out of range', 'code': 100}}
cause it assumes that one and only one SP is always there.

Version-Release number of selected component (if applicable):
4.17.10-2.gitdbbc5a4.el7

How reproducible:
100%

Steps to Reproduce:
1. call getImagesList on an unconnected file based storage domain
2.
3.

Actual results:
{'status': {'message': 'list index out of range', 'code': 100}}

Expected results:
The images list


Additional info:

Comment 1 Allon Mureinik 2015-10-26 13:27:17 UTC
Nir, can you review the attached patch please?

Comment 2 Yaniv Lavi 2015-10-29 12:51:40 UTC
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.

Comment 3 Nir Soffer 2015-11-09 14:25:01 UTC
In 3.6 you must be connected to the pool. We are not going to support snything else. This will possible in 4.0.

Comment 4 Red Hat Bugzilla Rules Engine 2015-11-17 09:26:28 UTC
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.

Comment 5 Fred Rolland 2016-01-14 13:04:57 UTC
Simone hi,

Can you test on master if this bug is still relevant ?

Adam changed the logic to not use the pool in this patch :

https://gerrit.ovirt.org/#/c/49684/3

Thanks

Comment 6 Simone Tiraboschi 2016-01-26 10:37:14 UTC
It looks OK with vdsm.noarch 4.17.999-536.git433b527.el7.centos form master.

Comment 7 Simone Tiraboschi 2016-02-01 08:50:05 UTC
Can you please backport path 49684 to 3.6 too?

Comment 8 Tal Nisan 2016-02-02 11:49:49 UTC
Simone, this is not a question to Freddy but rather to VDSM maintainers and product managers, in general this late into 3.6 I think it's too risky, we can propose it to 3.6.z if you'd like

Comment 9 Simone Tiraboschi 2016-02-02 13:02:01 UTC
Thanks Tal,
I'm asking this for bug 1303316.
Just to summarize the issue: hosted-engine-setup deploys the engine VM, only when the data-center goes up (it requires the user to manually add the first regular storage domain) after an hour it will create the OVF_STORE volumes so the issue is that hosted-engine-setup cannot know about the OVF_STORE uuid.

After a reboot, ovirt-ha-agent has to scan the OVF_STORE to get the latest configuration for the engine VM.
The issue here is that we are not calling prepareImage since ovirt-ha-agent doesn't know the OVF_STORE uuid and we cannot call getImagesList since we are not connected to any storage pool at that point.
In general it seams to, strangely, work also without the prepareImage on the OVF_STORE but on bug 1303316 we have a report about OVF_STORE being not accessible on FC since its LV is still down.

Is there another way to prepare all the images on a storage domain without knowing their UUID?

Comment 10 Tal Nisan 2016-02-02 13:05:48 UTC
Not sure, Nir, any idea?

Comment 11 Allon Mureinik 2016-02-03 09:15:38 UTC
(In reply to Tal Nisan from comment #8)
> Simone, this is not a question to Freddy but rather to VDSM maintainers and
> product managers, in general this late into 3.6 I think it's too risky, we
> can propose it to 3.6.z if you'd like
This patch is part of the storage domain manifest in 4.0.
It won't be backported.

Comment 12 Nir Soffer 2016-02-11 08:29:02 UTC
(In reply to Allon Mureinik from comment #11)
> (In reply to Tal Nisan from comment #8)
> > Simone, this is not a question to Freddy but rather to VDSM maintainers and
> > product managers, in general this late into 3.6 I think it's too risky, we
> > can propose it to 3.6.z if you'd like
> This patch is part of the storage domain manifest in 4.0.
> It won't be backported.

It should be possible to backport this patch, the dependency on the pool is 
not needed in this code path.

Comment 13 Simone Tiraboschi 2016-02-23 17:12:20 UTC
getImagesList seams to fails also on iSCSI on vdsm.noarch                       4.17.21-0.el7ev          @rhev-3.6.3-3

Thread-5472::DEBUG::2016-02-23 19:07:26,796::task::595::Storage.TaskManager.Task::(_updateState) Task=`d364b006-34f5-429d-95c7-a4725f449763`::moving from state init -> state preparing
Thread-5472::INFO::2016-02-23 19:07:26,797::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID={'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}, options=None)
Thread-5472::WARNING::2016-02-23 19:07:26,797::resourceManager::834::Storage.ResourceManager.Owner::(acquire) Unexpected exception caught while owner 'd364b006-34f5-429d-95c7-a4725f449763' tried to acquire 'Storage.{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}'
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 814, in acquire
    locktype, timeout)
  File "/usr/share/vdsm/storage/resourceManager.py", line 514, in acquireResource
    request = self.registerResource(namespace, name, lockType, callback)
  File "/usr/share/vdsm/storage/resourceManager.py", line 538, in registerResource
    if not self._resourceNameValidator.match(name):
TypeError: expected string or buffer
Thread-5472::ERROR::2016-02-23 19:07:26,797::task::866::Storage.TaskManager.Task::(_setError) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3307, in getImagesList
    vars.task.getSharedLock(STORAGE, sdUUID)
  File "/usr/share/vdsm/storage/task.py", line 1375, in getSharedLock
    timeout)
  File "/usr/share/vdsm/storage/resourceManager.py", line 835, in acquire
    raise se.ResourceException(fullName)
ResourceException: General Exception, UUID: "Storage.{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}"
Thread-5472::DEBUG::2016-02-23 19:07:26,797::task::885::Storage.TaskManager.Task::(_run) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Task._run: d364b006-34f5-429d-95c7-a4725f449763 ({'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']},) {} failed - stopping task
Thread-5472::DEBUG::2016-02-23 19:07:26,797::task::1246::Storage.TaskManager.Task::(stop) Task=`d364b006-34f5-429d-95c7-a4725f449763`::stopping in state preparing (force False)
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::993::Storage.TaskManager.Task::(_decref) Task=`d364b006-34f5-429d-95c7-a4725f449763`::ref 1 aborting True
Thread-5472::INFO::2016-02-23 19:07:26,798::task::1171::Storage.TaskManager.Task::(prepare) Task=`d364b006-34f5-429d-95c7-a4725f449763`::aborting: Task is aborted: u'General Exception, UUID: "Storage.{\'status\': {\'message\': \'OK\', \'code\': 0}, \'imageslist\': [\'2e9a5dc5-5c83-43bd-8f32-79825194c368\', \'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258\', \'9dae0125-7467-4a53-b975-f022e32d15ac\', \'7d8c0950-399c-4b0f-aa90-317381da1c6e\']}"' - code 100
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::1176::Storage.TaskManager.Task::(prepare) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Prepare: aborted: General Exception, UUID: "Storage.{'status': {'message': 'OK', 'code': 0}, 'imageslist': ['2e9a5dc5-5c83-43bd-8f32-79825194c368', 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258', '9dae0125-7467-4a53-b975-f022e32d15ac', '7d8c0950-399c-4b0f-aa90-317381da1c6e']}"
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::993::Storage.TaskManager.Task::(_decref) Task=`d364b006-34f5-429d-95c7-a4725f449763`::ref 0 aborting True
Thread-5472::DEBUG::2016-02-23 19:07:26,798::task::928::Storage.TaskManager.Task::(_doAbort) Task=`d364b006-34f5-429d-95c7-a4725f449763`::Task._doAbort: force False

Comment 14 Simone Tiraboschi 2016-02-23 17:29:28 UTC
(In reply to Simone Tiraboschi from comment #13)
> getImagesList seams to fails also on iSCSI on vdsm.noarch                   
> 4.17.21-0.el7ev          @rhev-3.6.3-3
> 
> Thread-5472::DEBUG::2016-02-23
> 19:07:26,796::task::595::Storage.TaskManager.Task::(_updateState)
> Task=`d364b006-34f5-429d-95c7-a4725f449763`::moving from state init -> state
> preparing
> Thread-5472::INFO::2016-02-23
> 19:07:26,797::logUtils::48::dispatcher::(wrapper) Run and protect:
> getImagesList(sdUUID={'status': {'message': 'OK', 'code': 0}, 'imageslist':
> ['2e9a5dc5-5c83-43bd-8f32-79825194c368',
> 'aa908bcb-16f0-48e8-a9cc-bcc35bf4a258',
> '9dae0125-7467-4a53-b975-f022e32d15ac',
> '7d8c0950-399c-4b0f-aa90-317381da1c6e']}, options=None)

Sorry, found: it's just the result of a bad request. Stupid bug!

Comment 15 Simone Tiraboschi 2016-05-20 12:20:14 UTC
*** Bug 1331503 has been marked as a duplicate of this bug. ***

Comment 17 Carlos Mestre González 2016-07-12 11:19:33 UTC
Hi Fred,

Can I ask you about the way to test this? I'm using a deactivated storage domain (in ovirt) and test with vdsClient getImagesList but I'm getting a:

Storage domain does not exist: ('928a25e0-2489-4ea2-bc55-520b112c0920',)

should I test checking the logs when importing a domain (does it gets trigger the getImagesList in that case)?

Comment 18 Fred Rolland 2016-07-12 11:39:32 UTC
Hi,
Can you provide the vdsm log ?

Thanks,

Fred

Comment 19 Carlos Mestre González 2016-07-12 13:05:03 UTC
Created attachment 1178904 [details]
vdsm.log storagedomaindoesnotexist

After deactivate the domain I call vdsClient getImagesList and got the StorageDomainDoesNotExist exception:

Thread-90274::ERROR::2016-07-12 15:59:50,668::task::868::Storage.TaskManager.Task::(_setError) Task=`b70c9b2b-d341-42e5-a757-6932d248972b`::Unexpected error

Comment 20 Fred Rolland 2016-07-12 14:32:44 UTC
I think you should ask Simone on how to test this as it is a hosted engine flow.

The exception is not about the pool as the bug is about.

Comment 21 Carlos Mestre González 2016-07-12 14:58:30 UTC
Simone, can you provide the steps on HE to verify this?

Comment 22 Simone Tiraboschi 2016-07-18 08:37:42 UTC
(In reply to Carlos Mestre González from comment #21)
> Simone, can you provide the steps on HE to verify this?

1. Deploy hosted-engine (please try it once on NFS and once on iSCSI); add your first regular storage domain and ensure that the engine imports the hosted-engine storage domain 
2. set global maintenance mode with hosted-engine --set-maintenance --mode=global
3. Cleanly shutdown the engine with hosted-engine --vm-shutdown
4. Reboot the host
5. Ged the hosted-engine storage domain uuid from /etc/ovirt-hosted-engine/hosted-engine.conf (sdUUID) 
5. Run 'vdsClient -s 0 getImagesList <sdUUID>

Comment 23 Elad 2016-07-19 12:04:52 UTC
getImagesList returns the images list (4 in total) under the hosted_storage storage domain after performing the scenario described in comment #22.

Steps as described in comment #22, deployed twice, iSCSI and NFS (over a clean server - freshly installed OS before deployment).



[root@blond-vdsf ~]# vdsClient -s 0 getImagesList 028649d6-d048-453a-b7aa-e96881cf0443
62ee53ee-5220-4b62-8ad8-f6c536e18070
a400f71f-fd52-49ca-9efd-3e5345303675
fd64b0a4-a1d2-439d-8649-cf8d6e3765c6
9e02d366-c926-485d-9870-dacfafabf5bc


Verified using:

ovirt-vmconsole-host-1.0.3-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
ovirt-host-deploy-1.5.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-vmconsole-1.0.3-1.el7ev.noarch
libgovirt-0.3.3-1.el7_2.4.x86_64
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
vdsm-api-4.18.5.1-1.el7ev.noarch
vdsm-infra-4.18.5.1-1.el7ev.noarch
vdsm-xmlrpc-4.18.5.1-1.el7ev.noarch
vdsm-jsonrpc-4.18.5.1-1.el7ev.noarch
vdsm-4.18.5.1-1.el7ev.x86_64
vdsm-python-4.18.5.1-1.el7ev.noarch
vdsm-yajsonrpc-4.18.5.1-1.el7ev.noarch
vdsm-cli-4.18.5.1-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.5.1-1.el7ev.noarch
fence-agents-rhevm-4.0.11-27.el7_2.8.x86_64
rhevm-appliance-20160623.0-1.el7ev.noarch
libvirt-daemon-1.2.17-13.el7_2.5.x86_64