Bug 996130

Summary: [vdsm] SecureError during getAllTasksInfo
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: abaron, acanan, acathrow, bazulay, ebenahar, ewarszaw, gwatson, hateya, iheim, lpeer, scohen, ybronhei, yeylon, ykaplan
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0Flags: scohen: Triaged+
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-02 15:36:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Elad 2013-08-12 13:25:44 UTC
Created attachment 785684 [details]
logs

Description of problem:
vdsm report an error during getAllTasksInfo command:

Thread-23563::ERROR::2013-08-12 15:55:40,534::task::850::TaskManager.Task::(_setError) Task=`f39d9f3b-d4e1-42c8-ad38-b8a8ef3e65d5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2150, in getAllTasksInfo
    allTasksInfo = sp.getAllTasksInfo()
  File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper
    raise SecureError()
SecureError


Version-Release number of selected component (if applicable):
vdsm-4.12.0-52.gitce029ba.el6ev.x86_64
is9.1

How reproducible:
unknown



Additional info: logs

Comment 1 Barak 2013-08-15 09:15:38 UTC
Was it the SPM ?

Comment 2 Yaniv Bronhaim 2013-08-15 12:11:35 UTC
Must be, might be a race between setting the securable parameter and setting the SPM_ROLE . In any case, this exception is equivalent to NOT SPM exception and engine should treat both in the same way. 

This is not a bug.. unless engine treats it differently, what is the actual behavior here?

Comment 3 Yaniv Bronhaim 2013-08-15 13:06:32 UTC
Eduardo, after checking again the code, if you recall - we decided to do :

try:                                                                    
 sp = self.pools.values()[0]                                         
except IndexError:                                                      
 raise se.SpmStatusError()                                           
allTasksStatus = sp.getAllTasksStatuses()

The secure error is definitely raised if the host is not set to be SPM. But, we should get to  sp.getAllTasksStatuses() only if we're not running as SPM, and the check we do is reading the first pool.  

The SecureError raises more than once, even a lot more.. is it possible that after host loose its SPM, it doesn't clean the pools dictionary as expected?

This question is in additional to the above question about the engine's behavior when receiving such exception

Comment 4 Barak 2013-08-15 15:11:21 UTC
per comment #3 moving this bug to storage

Comment 5 Eduardo Warszawski 2013-08-22 04:32:33 UTC
(In reply to Yaniv Bronhaim from comment #3)
> Eduardo, after checking again the code, if you recall - we decided to do :
> 
> try:                                                                    
>  sp = self.pools.values()[0]                                         
> except IndexError:                                                      
>  raise se.SpmStatusError()                                           
> allTasksStatus = sp.getAllTasksStatuses()
> 
> The secure error is definitely raised if the host is not set to be SPM. But,
> we should get to  sp.getAllTasksStatuses() only if we're not running as SPM,
> and the check we do is reading the first pool.  
> 
> The SecureError raises more than once, even a lot more.. is it possible that
> after host loose its SPM, it doesn't clean the pools dictionary as expected?
> 
> This question is in additional to the above question about the engine's
> behavior when receiving such exception

Lost of the spm condition does not lead an should no lead to pool disconnection.
But the opposite is true: being connected to the pool is a sine equa non condition in order to to be spm. You are affirming the consequent.

In the snippet you provided, the exception is valid, for sure host is not SPM if  there is no pool object. 
But calling sp.getAllTasksStatuses() based in the SP object existence is not sufficient.
All the sp.getAllTasksStatuses() should be revised.

Comment 6 Yaniv Bronhaim 2013-08-22 14:42:06 UTC
If having valid value in self.pools.values()[0] doesn't mean that the host runs as SPM we should fix the condition and use the validateSPM function as I suggested in the first place.

Please fix that issue, although SECURE_ERROR and NOT_SPM as the same error code, and currently we don't feel it in engine's side, we should not handle them in the same way. 
SECURE_ERROR means something wrong happened because VDSM itself thinks that it runs as SPM and called sp function that was not supposed to be called (bug).

Comment 7 Ayal Baron 2013-09-02 15:36:19 UTC
SECURE_ERROR *is* NOT_SPM (it is translated before being sent to engine).
Getting this exception means that the host is simply not the spm.
There is no bug here.(In reply to Yaniv Bronhaim from comment #6)

> If having valid value in self.pools.values()[0] doesn't mean that the host

It doesn't.

> runs as SPM we should fix the condition and use the validateSPM function as
> I suggested in the first place.

No, your conclusion is incorrect.  The operation is an SPM op, any 'if' you perform would be a race.  Engine sends this command under the expectation that the host is the spm, if it is not, it should raise an error (as it does here).
Closing.

Comment 8 Yaniv Bronhaim 2013-09-03 08:02:36 UTC
I honestly don't agree Ayal, my conclusion is perfectly right.
If your saying is right about "self.pools.values()[0] doesn't mean that the host runs as SPM", the following part of code

try:                                                                    
  sp = self.pools.values()[0]                                         
except IndexError:                                                      
  raise se.SpmStatusError()

Doesn't do anything. Only if the host never ran as SPM.

That sounds odd to me. Please explain why did we pick this solution instead of just forward the request directly to sp.py and get secureError?

I recall your saying that if an host doesn't run as SPM it doesn't hold a storagePool instance in its internal pools list.

If that's not true, please remove this part of code. Moving the getAllTasksList to sp.py was enough

Comment 9 Yaniv Bronhaim 2013-09-08 15:15:15 UTC
Anyway, the try scope is redundant. please consider http://gerrit.ovirt.org/#/c/18843/

thanks

Comment 11 Red Hat Bugzilla 2023-09-14 01:49:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days