Bug 1023145
Summary: | Storage and dc are up although there is no host | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ohad Basan <obasan> | |
Component: | ovirt-engine | Assignee: | Martin Perina <mperina> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tareq Alayan <talayan> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.3.0 | CC: | aberezin, acathrow, amureini, bazulay, ebenahar, eedri, emesika, iheim, laravot, lpeer, pstehlik, Rhev-m-bugs, scohen, yeylon | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | 3.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | infra | |||
Fixed In Version: | ovirt-3.4.0-alpha1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1023730 1051890 (view as bug list) | Environment: | ||
Last Closed: | 2013-11-03 13:12:18 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1051890, 1078909, 1142926 |
Description
Ohad Basan
2013-10-24 17:27:44 UTC
in addition, even though the storage was up in the new dc (nfs) when I tried to connect an nfs storage to it I encountered a failure. in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never polled tasks, the host is being moved to maintenance - although spm stop (vdsm verb, not the engine command) isn't being called as there are tasks on the host the host still moves to maintenance, while in the engine cache the host is still marked as the spm (which cause the pool to remain in status "UP"). (In reply to Liron Aravot from comment #4) > in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never > polled tasks, the host is being moved to maintenance - although spm stop > (vdsm verb, not the engine command) isn't being called as there are tasks > on the host the host still moves to maintenance, while in the engine cache > the host is still marked as the spm (which cause the pool to remain in > status "UP"). So what's the action item here? fail maintenance? yep - host shouldn't move to maintenance while it's spm This is a host life cycle issue. *** This bug has been marked as a duplicate of bug 975742 *** (In reply to Liron Aravot from comment #4) > in bug https://bugzilla.redhat.com/show_bug.cgi?id=1023730 , there are never > polled tasks, the host is being moved to maintenance - although spm stop > (vdsm verb, not the engine command) isn't being called as there are tasks > on the host the host still moves to maintenance, while in the engine cache > the host is still marked as the spm (which cause the pool to remain in > status "UP"). Why dosn't the engine detects the host is not SPM anymore ? Engine should be polling for spmStatus ? that's exactly the issue, the engine cache saving which host is the spm isn't being cleared as spm stop hasn't been performed, yet the host still moves to maintenance/prepare for maintenance..if spm stop wasn't executed for some reason, the host shouldn't move to maintenance. Please refresh my memory, Shouldn't stopSPM be called on the move to maintenance ? it is called, but as there are tasks running on the host..spm stop doesn't perform anything. the problem is that the fact that spm stop vdsm verb wasn't executed on the host and the engine "cache" wasn't cleared, it still moves to maintenance. Something similar happend to me upstream: 2013-11-10 14:43:07,128 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-8) Host encounter a problem moving to maintenance mode, probably error during disconnecting it from pool VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DisconnectStoragePoolVDS, error = Operation not allowed while SPM is active: ('69676911-8344-42c2-adc5-033676b88a09',) (Failed with error IsSpm and code 656). The Host will stay in Maintenance (In reply to Liron Aravot from comment #13) > Something similar happend to me upstream: > > > 2013-11-10 14:43:07,128 ERROR > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > (DefaultQuartzScheduler_Worker-8) Host encounter a problem moving to > maintenance mode, probably error during disconnecting it from pool > VdcBLLException: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: > VDSGenericException: VDSErrorException: Failed to DisconnectStoragePoolVDS, > error = Operation not allowed while SPM is active: > ('69676911-8344-42c2-adc5-033676b88a09',) (Failed with error IsSpm and code > 656). The Host will stay in Maintenance Don't understand - the move to Maintenance should fail, and be reflected in the host status. above you stated "The Host will stay in Maintenance" ? Barak, that's exactly the issue, the host still moves to maintenance. That quote isn't something i stated, it's also a copy from the log. Summary of the problem: - when one tries to move a host to maintenance the engine start by moving the host to prepare for maintenance. - during that phase a stopSPM call is made to the vdsm (in case it is SPM), but vdsm will fail if it has existing async tasks running. - The engine in such a case (move to maintenance) will ignore the error and continue with the sate transition, and will not even clear the irsBroker reference in use. - hence this enables the user to move the host (SPM of DC X) to another DC (Y) while being the SPM of DC X. Either we: 1. fail the move to maintenance in such a case, or 2. fail the move to another DC. I personally prefer number 1 as it is clearer to the user. This is Ayal, Arthur - what do you think ? (In reply to Barak from comment #16) > Summary of the problem: > > - when one tries to move a host to maintenance the engine start by moving > the > host to prepare for maintenance. > - during that phase a stopSPM call is made to the vdsm (in case it is SPM), > but > vdsm will fail if it has existing async tasks running. > - The engine in such a case (move to maintenance) will ignore the error and > continue with the sate transition, and will not even clear the irsBroker > reference in use. > - hence this enables the user to move the host (SPM of DC X) to another DC > (Y) > while being the SPM of DC X. > > Either we: > 1. fail the move to maintenance in such a case, > or > 2. fail the move to another DC. > > I personally prefer number 1 as it is clearer to the user. Ack, but user should have a way to fence the host so move to maintenance would work. *** Bug 1032972 has been marked as a duplicate of this bug. *** (In reply to Ayal Baron from comment #19) > (In reply to Barak from comment #16) .... > > > > Either we: > > 1. fail the move to maintenance in such a case, > > or > > 2. fail the move to another DC. > > > > I personally prefer number 1 as it is clearer to the user. > > Ack, but user should have a way to fence the host so move to maintenance > would work. It is possible to fence manually (restart) a host even when it is up. Arthur ? If vdsm has async tasks, shouldn't it wait to finish all async tasks, and only then to execute stopSPM call and move host to maintenance mode? if the user wouldn't like to wait for all async tasks to finish he could fence the host. ovirt 3.4.0 alpha has been released If host has Async tasks the foolowing msg appears: Error while executing action: Cannot switch Host to Maintenance mode. Host has asynchronous running tasks, wait for operation to complete and retry. verified on ovirt-engine-3.4.0-0.7.beta2.el6.noarch Closing as part of 3.4.0 |