Created attachment 777728 [details] Logs from jenkins job Description of problem: I have one host and two storage domains in dc. When MDT_MASTER_VERSION tag on iscsi is set to wrong value (-1) on master domain, rhevm is supposed to perform reconstruct on second domain but instead it moves host to non-operational although both domains are still accessible. 2013-07-23 01:45:57,103 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-vm-10.scl.lab.tlv.redhat.com): 2013-07-23 01:40:11,365 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-83) Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) 2013-07-23 01:40:11,371 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-83) Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 1fafd73f-a2c5-4aeb-9ac8-3e23e15b0b34 Type: StoragePool 2013-07-23 01:40:11,941 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-83) Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) 2013-07-23 01:40:11,941 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) hostFromVds::selectedVds - 10.35.160.45, spmStatus returned null! 2013-07-23 01:40:11,942 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) IrsBroker::Failed::GetStoragePoolInfoVDS 2013-07-23 01:40:11,942 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) Exception: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) 2013-07-23 01:40:12,036 INFO [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-5-thread-47) Running command: ReconstructMasterDomainCommand internal: true. Entities affected : ID: 90dc9fae-a454-403b-b4cd-c56b85ab0465 Type: Storage 2013-07-23 01:40:15,945 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-47) Command DisconnectStoragePoolVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused 2013-07-23 01:40:15,945 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-47) FINISH, DisconnectStoragePoolVDSCommand, log id: 3ad6eda8 2013-07-23 01:40:15,947 ERROR [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-5-thread-47) Command org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused (Failed with VDSM error VDS_NETWORK_ERROR and code 5022) 2013-07-23 01:40:15,963 INFO [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-5-thread-47) Command [id=d9d6cbfd-3b3d-4253-8298-4ca199a68d33]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot [id=storagePoolId = 1fafd73f-a2c5-4aeb-9ac8-3e23e15b0b34, storageId = 90dc9fae-a454-403b-b4cd-c56b85ab0465, status=Unknown]. 2013-07-23 01:40:18,369 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) Command ListVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused 2013-07-23 01:40:18,378 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-90) Failed to refresh VDS , vds = d2909cc7-75f8-4b5a-be40-a23610b55074 : 10.35.160.45, VDS Network Error, continuing. 2013-07-23 01:40:50,175 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-50) Domain 90dc9fae-a454-403b-b4cd-c56b85ab0465 is not seen by Host 2013-07-23 01:40:50,175 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-50) Domain 37a29881-6e4d-4092-ae4f-05047953a7a4 is not seen by Host 2013-07-23 01:40:50,176 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-5-thread-50) One of the Storage Domains of host 10.35.160.45 in pool datacenter_storage_spm_negative is problematic 2013-07-23 01:40:50,185 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-50) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host 10.35.160.45 reports about one of the Active Storage Domains as Problematic. 2013-07-23 01:40:50,368 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (pool-5-thread-50) ResourceManager::vdsMaintenance - Failed migrating desktop vm_CorruptMetadata 2013-07-23 01:45:00,017 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-88) Autorecovering 1 hosts 2013-07-23 01:45:19,923 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-47) Domain 90dc9fae-a454-403b-b4cd-c56b85ab0465 is not seen by Host 2013-07-23 01:45:19,923 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-47) Domain 37a29881-6e4d-4092-ae4f-05047953a7a4 is not seen by Host 2013-07-23 01:45:19,923 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-5-thread-47) One of the Storage Domains of host 10.35.160.45 in pool datacenter_storage_spm_negative is problematic 2013-07-23 01:45:19,936 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-47) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host 10.35.160.45 reports about one of the Active Storage Domains as Problematic. 2013-07-23 01:45:20,097 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (pool-5-thread-47) ResourceManager::vdsMaintenance - Failed migrating desktop vm_CorruptMetadata 2013-07-23 01:45:26,247 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-48) domain 90dc9fae-a454-403b-b4cd-c56b85ab0465:master_domain in problem. vds: 10.35.160.45 2013-07-23 01:45:50,319 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-47) Domain 37a29881-6e4d-4092-ae4f-05047953a7a4:360060160f4a0300028bb362b33f3e211_0 was reported by all hosts in status UP as problematic. Moving the domain to NonOperational. 2013-07-23 01:45:50,344 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-5-thread-47) Lock Acquired to object EngineLock [exclusiveLocks= key: 37a29881-6e4d-4092-ae4f-05047953a7a4 value: STORAGE Version-Release number of selected component (if applicable): rhevm-3.3.0-0.9.master.el6ev.noarch vdsm-4.12.0-rc1.12.git8ee6885.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Have one host and two storage domains in DC which is up 2. Corrupt metadata value MDT_MASTER_VERSION to -1 3. Actual results: Host is moved to non-operational Expected results: Reconstruct is made on second domain which becomes the master Additional info: Works in 3.2 MDT were corrupted at about 01:33:34,124 Logs are attached
Allon, can we confirm this indeed a regression?
*** This bug has been marked as a duplicate of bug 967749 ***