Bug 1140430
Summary: | Failure to Attach ISO domain causes SPM failover | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Marina Kalinin <mkalinin> | |
Component: | ovirt-engine | Assignee: | Federico Simoncelli <fsimonce> | |
Status: | CLOSED ERRATA | QA Contact: | Aharon Canan <acanan> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.4.0 | CC: | acanan, adahms, amureini, bkorren, ecohen, fsimonce, gklein, gwatson, iheim, laravot, lpeer, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | 3.5.0 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | storage | |||
Fixed In Version: | org.ovirt.engine-root-3.5.0-18 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, failing to attach an ISO domain caused the storage pool manager to fail over and start an attempt to select a new storage pool manager. This behavior caused a potential fail over storm if the domain itself was corrupted, leaving the system without a storage pool manager for a prolonged time. With this update, a failed attempt to attach an ISO domain to a data center triggers an error message, but does not cause the storage pool manager to fail over.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1157212 (view as bug list) | Environment: | ||
Last Closed: | 2015-02-11 18:09:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1157212 |
Description
Marina Kalinin
2014-09-11 00:41:53 UTC
This happens only when "acquire" succeeds and "renew" fails. The positive flow is quite fast usually (few seconds) but if the iso domains become unreachable between acquire and renew then the host is fenced. So if you try to attach an iso that is not reachable you usually don't get fenced (unless as I said the first write is successful). Federico and all, This is an engine back (probably). It is IRSBroker that initiates the failover. However, why the exception returned by vdsm is IRSNoMasterDomainException? This is what is happening in engine.log: ~~~ 2014-09-15 16:36:28,408 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-4-thread-27) [1df625e] Failed in AttachStorageDomainVDS method 2014-09-15 16:36:28,497 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-4-thread-27) [1df625e] IrsBroker::Failed::AttachStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Storage domain does not exist: ('de6fe43b-e85c-48e2-82f8-38963d42fe4c',), code = 358 2014-09-15 16:36:28,550 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-4-thread-27) [1df625e] START, SpmStopVDSCommand(HostName = rhel34-mku, HostId = 49b33266-8770-45e7-904d-bd704a648e72, storagePoolId = 00000002-0002-0002-0002-00000000018a), log id: 2c06f55d 2014-09-15 16:36:28,627 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-4-thread-27) [1df625e] SpmStopVDSCommand::Stopping SPM on vds rhel34-mku, pool id 00000002-0002-0002-0002-00000000018a 2014-09-15 16:36:28,748 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-4-thread-27) [1df625e] FINISH, SpmStopVDSCommand, log id: 2c06f55d 2014-09-15 16:36:28,748 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-4-thread-27) [1df625e] Irs placed on server 49b33266-8770-45e7-904d-bd704a648e72 failed. Proceed Failover ~~~ And this is the piece of IrsBroker exception handler that is happening here: ~~~ catch (IRSNoMasterDomainException ex) { getVDSReturnValue().setExceptionString(ex.toString()); getVDSReturnValue().setExceptionObject(ex); getVDSReturnValue().setVdsError(ex.getVdsError()); log.errorFormat("IrsBroker::Failed::{0}", getCommandName()); log.errorFormat("Exception: {0}", ex.getMessage()); if ((ex.getVdsError() == null || ex.getVdsError().getCode() != VdcBllErrors.StoragePoolWrongMaster) && getCurrentIrsProxyData().getHasVdssForSpmSelection()) { failover(); } else { isStartReconst ~~~ Looking into AttachStorageDomainVDSCommand.java, we see how we form the exception: ~~~ @Override protected VDSExceptionBase createDefaultConcreteException(String errorMessage) { StorageDomain domainFromDb = DbFacade.getInstance().getStorageDomainDao().get(getParameters().getStorageDomainId()); if (domainFromDb == null || domainFromDb.getStorageDomainType() == StorageDomainType.ImportExport) { return new IrsOperationFailedNoFailoverException(errorMessage); } return super.createDefaultConcreteException(errorMessage); } ~~~ Why aren't we checking for ISO domain as well? ISO Domain exists as an option in businessentities/StorageDomainType.java: ~~~ public enum StorageDomainType { Master, Data, ISO, ImportExport, Image, Unknown; ... ~~~ To me sounds like backend bug still. No reason to proceed SPM failover if ISO domain is inaccessible. verified using vt8 no SPM failover, from event log - "Failed to attach Storage Domain rhevm-3-iso-lion to Data Center Default. (User: admin)" The doc text was copied from the 3.4.4 clone bug #1157212 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0158.html |