Bug 965197
Summary: | engine: when trying to manually activate a domain when the storage is unknown and the storage is still unavailble the host becomes non-operational | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
Component: | ovirt-engine | Assignee: | Liron Aravot <laravot> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Dafna Ron <dron> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.2.0 | CC: | abaron, acanan, acathrow, amureini, dron, hateya, iheim, jkt, laravot, lpeer, Rhev-m-bugs, scohen, yeylon | ||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||
Target Release: | 3.3.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-07-09 15:53:10 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
what's the end result? i.e. what happens after 5 minutes? The behaviour seems to be here as expected - as soon as the storage is blocked, the spm should fence itself - as during the initVdsOnUp flow, the domain status is still Locked the host moves to non operational as expected after failing to connect to the pool - on the next run, as the domain status is compensated back to UNKNOWN - the host moves to status UP. that's the expected behaviour atm. When we attempt to activate the domain it's unknown - it's being locked 2013-05-20 19:31:31,623 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (pool-4-thread-43) [3400d2de] START, ActivateStorageDomainVDSComman d( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d), l og id: 6e3a4d55 ------------------------------------------------- 2013-05-20 19:31:47,243 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-39) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 4497d431-7c5e-4924-96e0-3f9cdbf826e5 : cougar01, VDS Network Error, continuing. java.net.ConnectException: Connection refused --------------------------------------------------- InitVdsOnUp failure, as the master domain is currently locked (by the activation) the host doesn't proceed with the flow. ---------------------------------------------------- 013-05-20 19:32:02,684 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-44) START, ConnectStoragePoolVDSCommand(HostName = cougar01, HostId = 4497d431-7c5e-4924-96e0-3f9cdbf826e5, storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, vds_spm_id = 1, masterDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, masterVersion = 523), log id: 1fe6a986 2013-05-20 19:32:14,173 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-4-thread-44) Could not connect host cougar01 to pool iSCSI 2013-05-20 19:32:14,187 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-47) [3662bbf2] Running command: SetNonOperationalVdsCommand int ernal: true. Entities affected : ID: 4497d431-7c5e-4924-96e0-3f9cdbf826e5 Type: VDS 2013-05-20 19:32:14,190 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-47) [3662bbf2] START, SetVdsStatusVDSCommand(HostName = cougar 01, HostId = 4497d431-7c5e-4924-96e0-3f9cdbf826e5, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: c6d68a1 2013-05-20 19:32:14,192 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-47) [3662bbf2] FINISH, SetVdsStatusVDSCommand, log id: ---------------------------------------------------- After failure in the activation - the domain status is returned to be UNKNOWN ----------------------------------------------------- 2013-05-20 19:32:21,232 ERROR [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (pool-4-thread-43) [3400d2de] Command org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: Cannot allocate IRS server 2013-05-20 19:32:21,235 INFO [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (pool-4-thread-43) [3400d2de] Command [id=10446986-610d-4d80-84f9-0e663b75ec7a]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot [id=storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, storageId = 38755249-4bb3-4841-bf5b-05f4a521514d, status=Unknown]. ----------------------------------------------------- Activation doesn't fail although failing to connect to the pool because of the domain status is unknown/inactive ----------------------------------------------------- 2013-05-20 19:35:02,686 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-43) START, ConnectStoragePoolVDSCommand(HostName = cougar01, HostId = 4497d431-7c5e-4924-96e0-3f9cdbf826e5, storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, vds_spm_id = 1, masterDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, masterVersion = 523), log id: 59847605 2013-05-20 19:35:05,182 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-43) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: 'spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb, msdUUID=38755249-4bb3-4841-bf5b-05f4a521514d']] 2013-05-20 19:35:05,182 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-43) HostName = cougar01 2013-05-20 19:35:05,182 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-43) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb, msdUUID=38755249-4bb3-4841-bf5b-05f4a521514d' 2013-05-20 19:35:05,182 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-43) FINISH, ConnectStoragePoolVDSCommand, log id: 59847605 2013-05-20 19:35:05,182 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-4-thread-43) Could not connect host cougar01 to pool iSCSI, as the master domain is in inactive/unknown status - not failing the operation --------------------------------------------------------------------- Allon, seems to me like this one can be closed. Closing based on comment 3. QA guys - if I'm missing something, please reopen and enlaborate. hosts should stay active unless only one of them can't see the storage. This is the behaviour that was decided by devel. if the manual activation by the user changes the flow than a particular behaviour in which a user activates a domain manually was not handled correctly. reopening this bug (In reply to Dafna Ron from comment #5) > hosts should stay active unless only one of them can't see the storage. > This is the behaviour that was decided by devel. > if the manual activation by the user changes the flow than a particular > behaviour in which a user activates a domain manually was not handled > correctly. > reopening this bug This is not correct. The host (cougar01) was spm and lost its lease so killed vdsm: 2013-05-20 19:31:58+0300 117008 [6758]: s27 kill 29988 sig 9 count 41 When coming back up it failed to connect to the pool Thread-17::ERROR::2013-05-20 19:32:18,452::task::850::TaskManager.Task::... StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb, msdUUID=38755249-4bb3-4841-bf5b-05f4a521514d' Hence it moves to non-op until the domain changes state. That is the correct behaviour. (In reply to Ayal Baron from comment #6) > (In reply to Dafna Ron from comment #5) > > hosts should stay active unless only one of them can't see the storage. > > This is the behaviour that was decided by devel. > > if the manual activation by the user changes the flow than a particular > > behaviour in which a user activates a domain manually was not handled > > correctly. > > reopening this bug > > This is not correct. > The host (cougar01) was spm and lost its lease so killed vdsm: > 2013-05-20 19:31:58+0300 117008 [6758]: s27 kill 29988 sig 9 count 41 > > When coming back up it failed to connect to the pool > > Thread-17::ERROR::2013-05-20 19:32:18,452::task::850::TaskManager.Task::... > StoragePoolMasterNotFound: Cannot find master domain: > 'spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb, > msdUUID=38755249-4bb3-4841-bf5b-05f4a521514d' > > Hence it moves to non-op until the domain changes state. > That is the correct behaviour. spm no longer needs to change state to nonop when it fails to connect to pool. That was the whole flow change (In reply to Dafna Ron from comment #7) > (In reply to Ayal Baron from comment #6) > > (In reply to Dafna Ron from comment #5) > > > hosts should stay active unless only one of them can't see the storage. > > > This is the behaviour that was decided by devel. > > > if the manual activation by the user changes the flow than a particular > > > behaviour in which a user activates a domain manually was not handled > > > correctly. > > > reopening this bug > > > > This is not correct. > > The host (cougar01) was spm and lost its lease so killed vdsm: > > 2013-05-20 19:31:58+0300 117008 [6758]: s27 kill 29988 sig 9 count 41 > > > > When coming back up it failed to connect to the pool > > > > Thread-17::ERROR::2013-05-20 19:32:18,452::task::850::TaskManager.Task::... > > StoragePoolMasterNotFound: Cannot find master domain: > > 'spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb, > > msdUUID=38755249-4bb3-4841-bf5b-05f4a521514d' > > > > Hence it moves to non-op until the domain changes state. > > That is the correct behaviour. > > spm no longer needs to change state to nonop when it fails to connect to > pool. > That was the whole flow change this has nothing to do with spm. vdsm starts up not connected to the pool, engine runs initvdsonup which calls connectStoragePool, that failed, so host moves to non-op until the domain changes state to inactive. |
Created attachment 750655 [details] logs Description of problem: if you block the storage -> when the domain becomes unknow aticate it manaully -> host becomes non-operational Version-Release number of selected component (if applicable): sf17 How reproducible: 100% Steps to Reproduce: 1. in a 3 hosts cluster with 2 iscsi domains block connectivity to both domains from all hosts 2. when the storage becomes unknown -> activate master domain manually 3. Actual results: one of the hosts is set as non-operational Expected results: hosts sould always remains in up state unless only one cannot see the storage. Additional info: logs