Bug 987874 - Wrong MDT_MASTER_VERSION value doesn't lead to reconstruct on second domain
Wrong MDT_MASTER_VERSION value doesn't lead to reconstruct on second domain
Status: CLOSED DUPLICATE of bug 967749
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Linux
unspecified Severity high
: ---
: 3.3.0
Assigned To: Allon Mureinik
Aharon Canan
storage
: Regression, TestBlocker, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-24 06:57 EDT by Jakub Libosvar
Modified: 2016-02-10 13:29 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-05 07:20:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs from jenkins job (3.26 MB, application/x-bzip)
2013-07-24 06:57 EDT, Jakub Libosvar
no flags Details

  None (edit)
Description Jakub Libosvar 2013-07-24 06:57:56 EDT
Created attachment 777728 [details]
Logs from jenkins job

Description of problem:
I have one host and two storage domains in dc. When MDT_MASTER_VERSION tag on iscsi is set to wrong value (-1) on master domain, rhevm is supposed to perform reconstruct on second domain but instead it moves host to non-operational although both domains are still accessible.

2013-07-23 01:45:57,103 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-vm-10.scl.lab.tlv.redhat.com): 2013-07-23 01:40:11,365 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-83) Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',)
2013-07-23 01:40:11,371 INFO  [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-83) Running command: SetStoragePoolStatusCommand internal: true. Entities affected :  ID: 1fafd73f-a2c5-4aeb-9ac8-3e23e15b0b34 Type: StoragePool
2013-07-23 01:40:11,941 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-83) Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',)
2013-07-23 01:40:11,941 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) hostFromVds::selectedVds - 10.35.160.45, spmStatus returned null!
2013-07-23 01:40:11,942 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) IrsBroker::Failed::GetStoragePoolInfoVDS
2013-07-23 01:40:11,942 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) Exception: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',)
2013-07-23 01:40:12,036 INFO  [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-5-thread-47) Running command: ReconstructMasterDomainCommand internal: true. Entities affected :  ID: 90dc9fae-a454-403b-b4cd-c56b85ab0465 Type: Storage
2013-07-23 01:40:15,945 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-47) Command DisconnectStoragePoolVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-07-23 01:40:15,945 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (pool-5-thread-47) FINISH, DisconnectStoragePoolVDSCommand, log id: 3ad6eda8
2013-07-23 01:40:15,947 ERROR [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-5-thread-47) Command org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused (Failed with VDSM error VDS_NETWORK_ERROR and code 5022)
2013-07-23 01:40:15,963 INFO  [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-5-thread-47) Command [id=d9d6cbfd-3b3d-4253-8298-4ca199a68d33]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot [id=storagePoolId = 1fafd73f-a2c5-4aeb-9ac8-3e23e15b0b34, storageId = 90dc9fae-a454-403b-b4cd-c56b85ab0465, status=Unknown].
2013-07-23 01:40:18,369 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) Command ListVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-07-23 01:40:18,378 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-90) Failed to refresh VDS , vds = d2909cc7-75f8-4b5a-be40-a23610b55074 : 10.35.160.45, VDS Network Error, continuing.
2013-07-23 01:40:50,175 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-50) Domain 90dc9fae-a454-403b-b4cd-c56b85ab0465 is not seen by Host
2013-07-23 01:40:50,175 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-50) Domain 37a29881-6e4d-4092-ae4f-05047953a7a4 is not seen by Host
2013-07-23 01:40:50,176 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-5-thread-50) One of the Storage Domains of host 10.35.160.45 in pool datacenter_storage_spm_negative is problematic
2013-07-23 01:40:50,185 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-50) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host 10.35.160.45 reports about one of the Active Storage Domains as Problematic.
2013-07-23 01:40:50,368 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (pool-5-thread-50) ResourceManager::vdsMaintenance - Failed migrating desktop vm_CorruptMetadata
2013-07-23 01:45:00,017 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-88) Autorecovering 1 hosts
2013-07-23 01:45:19,923 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-47) Domain 90dc9fae-a454-403b-b4cd-c56b85ab0465 is not seen by Host
2013-07-23 01:45:19,923 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-47) Domain 37a29881-6e4d-4092-ae4f-05047953a7a4 is not seen by Host
2013-07-23 01:45:19,923 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-5-thread-47) One of the Storage Domains of host 10.35.160.45 in pool datacenter_storage_spm_negative is problematic
2013-07-23 01:45:19,936 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-47) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host 10.35.160.45 reports about one of the Active Storage Domains as Problematic.
2013-07-23 01:45:20,097 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (pool-5-thread-47) ResourceManager::vdsMaintenance - Failed migrating desktop vm_CorruptMetadata
2013-07-23 01:45:26,247 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-48) domain 90dc9fae-a454-403b-b4cd-c56b85ab0465:master_domain in problem. vds: 10.35.160.45
2013-07-23 01:45:50,319 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-47) Domain 37a29881-6e4d-4092-ae4f-05047953a7a4:360060160f4a0300028bb362b33f3e211_0 was reported by all hosts in status UP as problematic. Moving the domain to NonOperational.
2013-07-23 01:45:50,344 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-5-thread-47) Lock Acquired to object EngineLock [exclusiveLocks= key: 37a29881-6e4d-4092-ae4f-05047953a7a4 value: STORAGE

Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.9.master.el6ev.noarch
vdsm-4.12.0-rc1.12.git8ee6885.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Have one host and two storage domains in DC which is up
2. Corrupt metadata value MDT_MASTER_VERSION to -1
3.

Actual results:
Host is moved to non-operational

Expected results:
Reconstruct is made on second domain which becomes the master

Additional info:
Works in 3.2
MDT were corrupted at about 01:33:34,124
Logs are attached
Comment 4 Sean Cohen 2013-07-31 09:02:37 EDT
Allon, can we confirm this indeed a regression?
Comment 6 Ayal Baron 2013-08-05 07:20:47 EDT

*** This bug has been marked as a duplicate of bug 967749 ***

Note You need to log in before you can comment on or make changes to this bug.