Bug 996130 - [vdsm] SecureError during getAllTasksInfo [NEEDINFO]
[vdsm] SecureError during getAllTasksInfo
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.3.0
x86_64 Unspecified
unspecified Severity high
: ---
: 3.3.0
Assigned To: Eduardo Warszawski
storage
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-12 09:25 EDT by Elad
Modified: 2016-02-10 11:50 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-02 11:36:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
gwatson: needinfo? (ewarszaw)
scohen: Triaged+


Attachments (Terms of Use)
logs (3.89 MB, application/x-gzip)
2013-08-12 09:25 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-08-12 09:25:44 EDT
Created attachment 785684 [details]
logs

Description of problem:
vdsm report an error during getAllTasksInfo command:

Thread-23563::ERROR::2013-08-12 15:55:40,534::task::850::TaskManager.Task::(_setError) Task=`f39d9f3b-d4e1-42c8-ad38-b8a8ef3e65d5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2150, in getAllTasksInfo
    allTasksInfo = sp.getAllTasksInfo()
  File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper
    raise SecureError()
SecureError


Version-Release number of selected component (if applicable):
vdsm-4.12.0-52.gitce029ba.el6ev.x86_64
is9.1

How reproducible:
unknown



Additional info: logs
Comment 1 Barak 2013-08-15 05:15:38 EDT
Was it the SPM ?
Comment 2 Yaniv Bronhaim 2013-08-15 08:11:35 EDT
Must be, might be a race between setting the securable parameter and setting the SPM_ROLE . In any case, this exception is equivalent to NOT SPM exception and engine should treat both in the same way. 

This is not a bug.. unless engine treats it differently, what is the actual behavior here?
Comment 3 Yaniv Bronhaim 2013-08-15 09:06:32 EDT
Eduardo, after checking again the code, if you recall - we decided to do :

try:                                                                    
 sp = self.pools.values()[0]                                         
except IndexError:                                                      
 raise se.SpmStatusError()                                           
allTasksStatus = sp.getAllTasksStatuses()

The secure error is definitely raised if the host is not set to be SPM. But, we should get to  sp.getAllTasksStatuses() only if we're not running as SPM, and the check we do is reading the first pool.  

The SecureError raises more than once, even a lot more.. is it possible that after host loose its SPM, it doesn't clean the pools dictionary as expected?

This question is in additional to the above question about the engine's behavior when receiving such exception
Comment 4 Barak 2013-08-15 11:11:21 EDT
per comment #3 moving this bug to storage
Comment 5 Eduardo Warszawski 2013-08-22 00:32:33 EDT
(In reply to Yaniv Bronhaim from comment #3)
> Eduardo, after checking again the code, if you recall - we decided to do :
> 
> try:                                                                    
>  sp = self.pools.values()[0]                                         
> except IndexError:                                                      
>  raise se.SpmStatusError()                                           
> allTasksStatus = sp.getAllTasksStatuses()
> 
> The secure error is definitely raised if the host is not set to be SPM. But,
> we should get to  sp.getAllTasksStatuses() only if we're not running as SPM,
> and the check we do is reading the first pool.  
> 
> The SecureError raises more than once, even a lot more.. is it possible that
> after host loose its SPM, it doesn't clean the pools dictionary as expected?
> 
> This question is in additional to the above question about the engine's
> behavior when receiving such exception

Lost of the spm condition does not lead an should no lead to pool disconnection.
But the opposite is true: being connected to the pool is a sine equa non condition in order to to be spm. You are affirming the consequent.

In the snippet you provided, the exception is valid, for sure host is not SPM if  there is no pool object. 
But calling sp.getAllTasksStatuses() based in the SP object existence is not sufficient.
All the sp.getAllTasksStatuses() should be revised.
Comment 6 Yaniv Bronhaim 2013-08-22 10:42:06 EDT
If having valid value in self.pools.values()[0] doesn't mean that the host runs as SPM we should fix the condition and use the validateSPM function as I suggested in the first place.

Please fix that issue, although SECURE_ERROR and NOT_SPM as the same error code, and currently we don't feel it in engine's side, we should not handle them in the same way. 
SECURE_ERROR means something wrong happened because VDSM itself thinks that it runs as SPM and called sp function that was not supposed to be called (bug).
Comment 7 Ayal Baron 2013-09-02 11:36:19 EDT
SECURE_ERROR *is* NOT_SPM (it is translated before being sent to engine).
Getting this exception means that the host is simply not the spm.
There is no bug here.(In reply to Yaniv Bronhaim from comment #6)

> If having valid value in self.pools.values()[0] doesn't mean that the host

It doesn't.

> runs as SPM we should fix the condition and use the validateSPM function as
> I suggested in the first place.

No, your conclusion is incorrect.  The operation is an SPM op, any 'if' you perform would be a race.  Engine sends this command under the expectation that the host is the spm, if it is not, it should raise an error (as it does here).
Closing.
Comment 8 Yaniv Bronhaim 2013-09-03 04:02:36 EDT
I honestly don't agree Ayal, my conclusion is perfectly right.
If your saying is right about "self.pools.values()[0] doesn't mean that the host runs as SPM", the following part of code

try:                                                                    
  sp = self.pools.values()[0]                                         
except IndexError:                                                      
  raise se.SpmStatusError()

Doesn't do anything. Only if the host never ran as SPM.

That sounds odd to me. Please explain why did we pick this solution instead of just forward the request directly to sp.py and get secureError?

I recall your saying that if an host doesn't run as SPM it doesn't hold a storagePool instance in its internal pools list.

If that's not true, please remove this part of code. Moving the getAllTasksList to sp.py was enough
Comment 9 Yaniv Bronhaim 2013-09-08 11:15:15 EDT
Anyway, the try scope is redundant. please consider http://gerrit.ovirt.org/#/c/18843/

thanks

Note You need to log in before you can comment on or make changes to this bug.