Bug 1023145

Summary: Storage and dc are up although there is no host
Product: Red Hat Enterprise Virtualization Manager Reporter: Ohad Basan <obasan>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: Tareq Alayan <talayan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aberezin, acathrow, amureini, bazulay, ebenahar, eedri, emesika, iheim, laravot, lpeer, pstehlik, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: ovirt-3.4.0-alpha1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1023730 1051890 (view as bug list) Environment:
Last Closed: 2013-11-03 13:12:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1051890, 1078909, 1142926    

Description Ohad Basan 2013-10-24 17:27:44 UTC
Description of problem:
My engine is in a situation where there is an iscsi dc in UP state, an iscsi storage in UP state but no active host
what I did was set up a full iscsi dc (dc,host,storage,vm) started the vm, created a snapshot, exported a template and then I moved the host to maintenence, moved it to a difference dc (NFS). the host is now up in the new dc but the previous iscsi dc and storage are still up - impossible state

Comment 3 Ohad Basan 2013-10-24 17:54:59 UTC
in addition, even though the storage was up in the new dc (nfs) when I tried to connect an nfs storage to it I encountered a failure.

Comment 4 Liron Aravot 2013-10-27 13:10:26 UTC
in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never polled tasks, the host is being moved to maintenance - although spm stop (vdsm verb, not the engine command) isn't being called  as there are tasks on the host the host still moves to maintenance, while in the engine cache the host is still marked as the spm (which cause the pool to remain in status "UP").

Comment 5 Allon Mureinik 2013-10-27 16:33:12 UTC
(In reply to Liron Aravot from comment #4)
> in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never
> polled tasks, the host is being moved to maintenance - although spm stop
> (vdsm verb, not the engine command) isn't being called  as there are tasks
> on the host the host still moves to maintenance, while in the engine cache
> the host is still marked as the spm (which cause the pool to remain in
> status "UP").
So what's the action item here? fail maintenance?

Comment 6 Liron Aravot 2013-10-28 07:49:11 UTC
yep - host shouldn't move to maintenance while it's spm

Comment 7 Ayal Baron 2013-10-28 13:59:05 UTC
This is a host life cycle issue.

Comment 8 Barak 2013-11-03 13:12:18 UTC

*** This bug has been marked as a duplicate of bug 975742 ***

Comment 9 Barak 2013-11-06 06:44:55 UTC
(In reply to Liron Aravot from comment #4)
> in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never
> polled tasks, the host is being moved to maintenance - although spm stop
> (vdsm verb, not the engine command) isn't being called  as there are tasks
> on the host the host still moves to maintenance, while in the engine cache
> the host is still marked as the spm (which cause the pool to remain in
> status "UP").

Why dosn't the engine detects the host is not SPM anymore ?
Engine should be polling for spmStatus ?

Comment 10 Liron Aravot 2013-11-06 06:59:01 UTC
that's exactly the issue, the engine cache saving which host is the spm isn't being cleared as spm stop hasn't been performed, yet the host still moves to maintenance/prepare for maintenance..if spm stop wasn't executed for some reason, the host shouldn't move to maintenance.

Comment 11 Barak 2013-11-07 18:59:03 UTC
Please refresh my memory,
Shouldn't stopSPM be called on the move to maintenance ?

Comment 12 Liron Aravot 2013-11-10 07:48:40 UTC
it is called, but as there are tasks running on the host..spm stop doesn't perform anything.
the problem is that the fact that spm stop vdsm verb wasn't executed on the host and the engine "cache" wasn't cleared, it still moves to maintenance.

Comment 13 Liron Aravot 2013-11-10 12:46:40 UTC
Something similar happend to me upstream:


2013-11-10 14:43:07,128 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-8) Host encounter a problem moving to maintenance mode, probably error during disconnecting it from pool VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DisconnectStoragePoolVDS, error = Operation not allowed while SPM is active: ('69676911-8344-42c2-adc5-033676b88a09',) (Failed with error IsSpm and code 656). The Host will stay in Maintenance

Comment 14 Barak 2013-11-10 18:58:31 UTC
(In reply to Liron Aravot from comment #13)
> Something similar happend to me upstream:
> 
> 
> 2013-11-10 14:43:07,128 ERROR
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
> (DefaultQuartzScheduler_Worker-8) Host encounter a problem moving to
> maintenance mode, probably error during disconnecting it from pool
> VdcBLLException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException:
> VDSGenericException: VDSErrorException: Failed to DisconnectStoragePoolVDS,
> error = Operation not allowed while SPM is active:
> ('69676911-8344-42c2-adc5-033676b88a09',) (Failed with error IsSpm and code
> 656). The Host will stay in Maintenance

Don't understand - the move to Maintenance should fail, and be reflected in the host status. above you stated "The Host will stay in Maintenance" ?

Comment 15 Liron Aravot 2013-11-11 07:22:25 UTC
Barak, that's exactly the issue, the host still moves to maintenance.
That quote isn't something i stated, it's also a copy from the log.

Comment 16 Barak 2013-11-11 12:12:04 UTC
Summary of the problem:

- when one tries to move a host to maintenance the engine start by moving the 
  host to prepare for maintenance.
- during that phase a stopSPM call is made to the vdsm (in case it is SPM), but 
  vdsm will fail if it has existing async tasks running.
- The engine in such a case (move to maintenance) will ignore the error and 
  continue with the sate transition, and will not even clear the irsBroker 
  reference in use.
- hence this enables the user to move the host (SPM of DC X) to another DC (Y) 
  while being the SPM of DC X.
 
Either we:
1. fail the move to maintenance in such a case,
or
2. fail the move to another DC.

I personally prefer number 1 as it is clearer to the user.

Comment 17 Barak 2013-11-11 12:15:30 UTC
This is

Comment 18 Barak 2013-11-11 12:16:51 UTC
Ayal, Arthur - what do you think ?

Comment 19 Ayal Baron 2013-11-21 13:05:39 UTC
(In reply to Barak from comment #16)
> Summary of the problem:
> 
> - when one tries to move a host to maintenance the engine start by moving
> the 
>   host to prepare for maintenance.
> - during that phase a stopSPM call is made to the vdsm (in case it is SPM),
> but 
>   vdsm will fail if it has existing async tasks running.
> - The engine in such a case (move to maintenance) will ignore the error and 
>   continue with the sate transition, and will not even clear the irsBroker 
>   reference in use.
> - hence this enables the user to move the host (SPM of DC X) to another DC
> (Y) 
>   while being the SPM of DC X.
>  
> Either we:
> 1. fail the move to maintenance in such a case,
> or
> 2. fail the move to another DC.
> 
> I personally prefer number 1 as it is clearer to the user.

Ack, but user should have a way to fence the host so move to maintenance would work.

Comment 20 Ayal Baron 2013-11-21 13:06:44 UTC
*** Bug 1032972 has been marked as a duplicate of this bug. ***

Comment 21 Barak 2013-11-21 17:15:08 UTC
(In reply to Ayal Baron from comment #19)
> (In reply to Barak from comment #16)

....

> >  
> > Either we:
> > 1. fail the move to maintenance in such a case,
> > or
> > 2. fail the move to another DC.
> > 
> > I personally prefer number 1 as it is clearer to the user.
> 
> Ack, but user should have a way to fence the host so move to maintenance
> would work.

It is possible to fence manually (restart) a host even when it is up.

Arthur ?

Comment 22 Arthur Berezin 2013-11-28 18:11:18 UTC
If vdsm has async tasks, shouldn't it wait to finish all async tasks, and only then to execute stopSPM call and move host to maintenance mode?

if the user wouldn't like to wait for all async tasks to finish he could fence the host.

Comment 25 Sandro Bonazzola 2014-01-14 08:42:55 UTC
ovirt 3.4.0 alpha has been released

Comment 26 Tareq Alayan 2014-02-17 12:56:54 UTC
If host has Async tasks the foolowing msg appears: 
Error while executing action: Cannot switch Host to Maintenance mode. Host has asynchronous running tasks,
wait for operation to complete and retry.


verified on ovirt-engine-3.4.0-0.7.beta2.el6.noarch

Comment 28 Itamar Heim 2014-06-12 14:08:48 UTC
Closing as part of 3.4.0