ovirt-engine-backend: Auto-Recovery should check whether getVdsStats returns 'lastCheck<60' before it starts to recover the host. Description: *************** (50 iSCSI SD's environment) rhevm-engine sends Host to 'Non-Operational' when getVdsStats returns value for 'lastCheck > 60' for one of the SD's. Auto-Recovery that runs every 5 minutes does not check that same criteria (lastCheck>60) and just sends ActivateVdsCommand, which succeeds. Results: ********* Since the criteria that caused engine to set host to 'Non-Operational' is different from the criteria that caused auto-recovery to set the host back to 'Up' --> the host will switch to 'Up' (for few minutes) and switched back to 'Non-Operational' vdsClient -s 0 getVdsStats (lastCheck': '72279.2): *************************************************** st'0.0259990692139', 'lastCheck': '11.9', 'code': 0, 'valid': True}, '4c84bb36-96fa-4b73-89e1-90f81c394483': {'delay': '0.0061411857605', 'lastCheck': '72496.9', 'code': 0, 'valid': True}, '768b1fcd-e79a-42d5-a85e-4f794a20d67f': {'delay': '0.168473005295', 'lastCheck': '8.7', 'code': 0, 'valid': True}, 'd23aa631-5a64-43d2-9dbe-ff83dc16be39': {'delay': '0.791082143784', 'lastCheck': '72279.2', 'code': 0, 'valid': True}, Engine.log: *********** 2012-07-30 16:35:55,250 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-51) [65234c09] Domain f02a705c-aaaa-43cc-9b0c-3730358d2 8c8 has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer. 2012-07-30 16:40:00,001 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [35b9bc67] Checking autorecoverable hosts 2012-07-30 16:40:00,018 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [35b9bc67] Autorecovering 3 hosts 2012-07-30 16:40:00,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [35b9bc67] Autorecovering hosts: 8a9ab47c-d97e-11e1-ad7a-43fef5f1 b388 2012-07-30 16:40:00,029 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [38500a41] Lock Acquired to object EngineLock [exclusiveLocks= key : 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 value: VDS , sharedLocks= ] 2012-07-30 16:40:00,029 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [38500a41] Running command: ActivateVdsCommand internal: true. Ent ities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 16:40:00,037 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-2) [38500a41] START, SetVdsStatusVDSCommand(vdsId = 8a9ab47 c-d97e-11e1-ad7a-43fef5f1b388, status=Unassigned, nonOperationalReason=NONE), log id: 6f619968 2012-07-30 16:40:00,058 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-2) [38500a41] FINISH, SetVdsStatusVDSCommand, log id: 6f619 968 2012-07-30 16:40:00,076 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-2) [38500a41] START, ActivateVdsVDSCommand(vdsId = 8a9ab47c- d97e-11e1-ad7a-43fef5f1b388), log id: 274e8c8e 2012-07-30 16:40:00,238 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-2) [46911aa8] Running command: HandleVdsCpuFlag sOrClusterChangedCommand internal: true. Entities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 16:40:00,249 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-2) [6ba4f799] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 16:40:00,252 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-2) [6ba4f799] FINISH, ActivateVdsVDSCommand, log id: 274e8c8 e 2012-07-30 16:40:00,309 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [6ba4f799] Lock freed to object EngineLock [exclusiveLocks= key: 8 a9ab47c-d97e-11e1-ad7a-43fef5f1b388 value: VDS , sharedLocks= ] 2012-07-30 16:40:00,311 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [6ba4f799] Autorecovering hosts: 8d11dd3e-d97e-11e1-a92e-831f3c8a 7b9d 2012-07-30 16:40:00,320 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [3fdba36b] Lock Acquired to object EngineLock [exclusiveLocks= key : 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d value: VDS , sharedLocks= ] 2012-07-30 16:40:00,320 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [3fdba36b] Running command: ActivateVdsCommand internal: true. Entities affected : ID: 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d Type: VDS 2012-07-30 16:40:00,326 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-2) [3fdba36b] START, SetVdsStatusVDSCommand(vdsId = 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d, status=Unassigned, nonOperationalReason=NONE), log id: 6ef236b5 2012-07-30 16:40:00,347 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-2) [3fdba36b] FINISH, SetVdsStatusVDSCommand, log id: 6ef236b5 2012-07-30 16:40:00,359 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-2) [3fdba36b] START, ActivateVdsVDSCommand(vdsId = 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d), log id: d4593cc 2012-07-30 16:40:00,537 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-2) [2f4b15c9] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d Type: VDS 2012-07-30 16:40:00,549 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-2) [6fe99d5e] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d Type: VDS 2012-07-30 16:40:00,552 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-2) [6fe99d5e] FINISH, ActivateVdsVDSCommand, log id: d4593cc 2012-07-30 16:40:00,590 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-90) [294fa788] Running command: InitVdsOnUpCommand internal: true. 2012-07-30 16:40:00,610 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [6fe99d5e] Lock freed to object EngineLock [exclusiveLocks= key: 8d11dd3e-d97e-11e1-a92e-831f3c8a7b9d value: VDS , sharedLocks= ] 2012-07-30 16:40:00,611 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [6fe99d5e] Autorecovering hosts: 8dc369be-d97e-11e1-8f21-cffe7252f68f 2012-07-30 16:40:00,614 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [114af736] Lock Acquired to object EngineLock [exclusiveLocks= key: 8dc369be-d97e-11e1-8f21-cffe7252f68f value: VDS , sharedLocks= ] 2012-07-30 16:40:00,614 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [114af736] Running command: ActivateVdsCommand internal: true. Entities affected : ID: 8dc369be-d97e-11e1-8f21-cffe7252f68f Type: VDS 2012-07-30 16:40:00,621 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-2) [114af736] START, SetVdsStatusVDSCommand(vdsId = 8dc369be-d97e-11e1-8f21-cffe7252f68f, status=Unassigned, nonOperationalReason=NONE), log id: 4152639a 2012-07-30 16:40:00,628 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-2) [114af736] FINISH, SetVdsStatusVDSCommand, log id: 4152639a 2012-07-30 16:40:00,635 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-2) [114af736] START, ActivateVdsVDSCommand(vdsId = 8dc369be-d97e-11e1-8f21-cffe7252f68f), log id: 4ddb72a2 2012-07-30 16:40:00,807 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-2) [4a4eb9b7] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: 8dc369be-d97e-11e1-8f21-cffe7252f68f Type: VDS 2012-07-30 16:40:00,812 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-2) [64452010] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 8dc369be-d97e-11e1-8f21-cffe7252f68f Type: VDS 2012-07-30 16:40:00,815 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-2) [64452010] FINISH, ActivateVdsVDSCommand, log id: 4ddb72a2 2012-07-30 16:40:00,843 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-90) [47fb7571] START, ValidateStorageServerConnectionVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, storagePoolId = 4f5ab747-38da-4693-ac0f-5531ce70945d, storageType = ISCSI, connectionList = [{ id: 235bd7e9-1d28-4dd7-b298-b40c2afff190, connection: 10.35.160.107 };]), log id: 4446cf6f 2012-07-30 16:40:00,858 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-90) [47fb7571] FINISH, ValidateStorageServerConnectionVDSCommand, return: {235bd7e9-1d28-4dd7-b298-b40c2afff190=0}, log id: 4446cf6f 2012-07-30 16:40:00,858 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-90) [47fb7571] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: 4f5ab747-38da-4693-ac0f-5531ce70945d Type: StoragePool 2012-07-30 16:40:00,868 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-90) [47fb7571] START, ConnectStorageServerVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, storagePoolId = 4f5ab747-38da-4693-ac0f-5531ce70945d, storageType = ISCSI, connectionList = [{ id: 235bd7e9-1d28-4dd7-b298-b40c2afff190, connection: 10.35.160.107 };]), log id: 1ed2d5cb 2012-07-30 16:40:00,869 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-2) [64452010] Lock freed to object EngineLock [exclusiveLocks= key: 8dc369be-d97e-11e1-8f21-cffe7252f68f value: VDS , sharedLocks= ] 2012-07-30 16:40:00,870 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [64452010] Checking autorecoverable hosts done 2012-07-30 16:40:00,870 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [64452010] Checking autorecoverable storage domains 2012-07-30 16:40:00,871 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [64452010] Autorecovering 0 storage domains 2012-07-30 16:40:00,871 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-2) [64452010] Checking autorecoverable storage domains done 2012-07-30 16:40:02,347 INFO [org.ovirt.engine.core.bll.DbUserCacheManager] (QuartzScheduler_Worker-87) [28a20b91] DbUserCacheManager::refreshAllUserData() - entered 2012-07-30 16:40:07,813 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-90) [47fb7571] FINISH, ConnectStorageServerVDSCommand, return: {235bd7e9-1d28-4dd7-b298-b40c2afff190=0}, log id: 1ed2d5cb 2012-07-30 16:40:07,813 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-90) [47fb7571] Host puma05 storage connection was succeeded 2012-07-30 16:40:07,894 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-90) [47fb7571] START, ConnectStoragePoolVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, storagePoolId = 4f5ab747-38da-4693-ac0f-5531ce70945d, vds_spm_id = 2, masterDomainId = a4b36945-3f41-4f2a-871b-dd70d7ca7ead, masterVersion = 1), log id: 748f7fbf 2012-07-30 16:40:10,683 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-98) [7529e477] starting ProcessDomainRecovery for domain b7491d9b-a0fb-4323-b787-b782e4695872 2012-07-30 16:40:10,713 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-98) [7529e477] vds puma05 reported domain b7491d9b-a0-4323-b787-b782e4695872:3600601601282300032f94e5b78cbe111 as in problem, moving the vds to status NonOperational 2012-07-30 16:40:10,730 WARN [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (QuartzScheduler_Worker-98) [d7f9295] The message key SetNonOperationalVds is missing from bundles/ExecutionMessages 2012-07-30 16:40:10,769 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-98) [d7f9295] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 16:40:10,769 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-98) [d7f9295] START, SetVdsStatusVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 7621f900
Created attachment 601303 [details] engine.log
I talked with Michael Kublin about this, Seems like at current time frame, solving this on existing auto recovery code is problematic. Suggesting this to rhevm-future.
I can not decide during initVdsOnUp if host is ok or not or problem is in storage. At case described in bug the problem was in vdsm so it looks easy, but if a problem in storage - I just can not distinct between two cases.
After discussion with Ayal and Barak, the following solution will be implement: I will add getVdsStats to InitVdsOnUp and will not move vdsm to Up if it has one of the domain in problem. Problem: if we have some domain in problem which is actually in problem but meanwhile was not moved to status InActive we can not move a vdsm to status UP. Holpfully will be moved to status up by auto recovery when domain will switch it status to InActive
http://gerrit.ovirt.org/#/c/8131/
https://gerrit.eng.lab.tlv.redhat.com/#/c/2262 Downstream
Still reproduces - RHEVM Build SI23 : Scenario: ********** Environment: multiple Hosts against one Storage machine. A) block connection between one HSM host to Storage Machine : 'iptables -A INPUT -s <Storage Machine IP> -j DROP' B) watch -d 'vdsClient -s 0 getVdsStats |grep 'lastCheck'' storageDomains = {'2376bde4-0b67-4e65-9acf-dc9c19a32d13': {'delay': '0', 'lastCheck': '215.3', 'code': 358, 'valid': False}, '4d9f133d-fdd7-4839-9bb6-e766db74f517 ': {'delay': '0', 'lastCheck': '216.0', ' Results: ********* 1) After about 6 minutes lastCheck = '350' the host will switch to non-operational. 2) about 5 minutes later hostAutoRecovery will attemt to activate the host althogh it not connected to the storage . 3) Event message will display a)"Detected new Host puma35. Host state was set to Up." b)"Failed to connect Host puma35 to Storage Pool iscsi_dc" c)"Host puma35 cannot access one of the Storage Domains attached to the Data Center iscsi_dc. Setting Host state to Non-Operational." 4) Host status will be change to 'Unasigned' then switch back to 'Non-Operational'. 5) Steps will repeat every 5 minutes. Engine.log (attaching full engine.log): ************************************* 2012-11-04 15:30:00,000 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-21) [3fb08a50] Checking autorecoverable hosts 2012-11-04 15:30:00,013 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-21) [3fb08a50] Autorecovering 1 hosts 2012-11-04 15:30:00,013 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-21) [3fb08a50] Autorecovering hosts: 8107e470-2663-11e2-9efd-441ea17 336ee 2012-11-04 15:30:00,027 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-21) [3c804998] Lock Acquired to object EngineLock [exclusiveLocks= ke y: 8107e470-2663-11e2-9efd-441ea17336ee value: VDS , sharedLocks= ] 2012-11-04 15:30:00,028 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-21) [3c804998] Running command: ActivateVdsCommand internal: true. En tities affected : ID: 8107e470-2663-11e2-9efd-441ea17336ee Type: VDS 2012-11-04 15:30:00,043 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-21) [3c804998] START, SetVdsStatusVDSCommand(HostName = pum a35, HostId = 8107e470-2663-11e2-9efd-441ea17336ee, status=Unassigned, nonOperationalReason=NONE), log id: 2dfee627 2012-11-04 15:30:00,057 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-21) [3c804998] FINISH, SetVdsStatusVDSCommand, log id: 2dfe e627 2012-11-04 15:30:00,086 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-21) [3c804998] START, ActivateVdsVDSCommand(HostName = puma3 5, HostId = 8107e470-2663-11e2-9efd-441ea17336ee), log id: 7bf823c9 2012-11-04 15:30:00,337 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-21) [42811bd4] Running command: HandleVdsCpuFla gsOrClusterChangedCommand internal: true. Entities affected : ID: 8107e470-2663-11e2-9efd-441ea17336ee Type: VDS 2012-11-04 15:30:00,351 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-21) [54f5c5c5] Running command: HandleVdsVersionCommand internal : true. Entities affected : ID: 8107e470-2663-11e2-9efd-441ea17336ee Type: VDS 2012-11-04 15:30:00,353 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-21) [54f5c5c5] FINISH, ActivateVdsVDSCommand, log id: 7bf823 c9 2012-11-04 15:30:00,890 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-21) [54f5c5c5] Lock freed to object EngineLock [exclusiveLocks= key: 8107e470-2663-11e2-9efd-441ea17336ee value: VDS , sharedLocks= ] 2012-11-04 15:30:00,891 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-21) [54f5c5c5] Checking autorecoverable hosts done 2012-11-04 15:30:00,892 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-21) [54f5c5c5] Autorecovering storage domains is disabled, skipping 2012-11-04 15:30:01,166 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-76) [5b35229d] Running command: InitVdsOnUpCommand internal: true. 2012-11-04 15:30:01,503 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-76) [53942145] START, ValidateStorageServerConnectionVDSCommand(HostName = puma35, HostId = 8107e470-2663-11e2-9efd-441ea17336ee, storagePoolId = 020d9b34-265d-11e2-8865-441ea17336ee, storageType = ISCSI, connectionList = [{ id: 93e1cadd-7f7d-4a7e-9216-baa62053352e, connection: 10.35.160.107, iqn: iqn.1992-04.com.emc:cx.ckm00121000438.b7, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 415b13cd 2012-11-04 15:30:01,524 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-76) [53942145] FINISH, ValidateStorageServerConnectionVDSCommand, return: {93e1cadd-7f7d-4a7e-9216-baa62053352e=0}, log id: 415b13cd 2012-11-04 15:30:01,524 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-76) [53942145] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: 020d9b34-265d-11e2-8865-441ea17336ee Type: StoragePool 2012-11-04 15:30:01,537 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-76) [53942145] START, ConnectStorageServerVDSCommand(HostName = puma35, HostId = 8107e470-2663-11e2-9efd-441ea17336ee, storagePoolId = 020d9b34-265d-11e2-8865-441ea17336ee, storageType = ISCSI, connectionList = [{ id: 93e1cadd-7f7d-4a7e-9216-baa62053352e, connection: 10.35.160.107, iqn: iqn.1992-04.com.emc:cx.ckm00121000438.b7, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 5de88663 2012-11-04 15:30:01,596 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-76) [53942145] FINISH, ConnectStorageServerVDSCommand, return: {93e1cadd-7f7d-4a7e-9216-baa62053352e=0}, log id: 5de88663 2012-11-04 15:30:01,596 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-76) [53942145] Host puma35 storage connection was succeeded 2012-11-04 15:30:01,632 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-76) [53942145] START, ConnectStoragePoolVDSCommand(HostName = puma35, HostId = 8107e470-2663-11e2-9efd-441ea17336ee, storagePoolId = 020d9b34-265d-11e2-8865-441ea17336ee, vds_spm_id = 32, masterDomainId = cc015740-7710-4985-ab63-08f8b90c054a, masterVersion = 1), log id: 47311fdb 2012-11-04 15:30:02,638 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-98) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:02,638 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-84) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:06,689 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-18) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:06,689 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-26) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:16,851 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-46) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:16,851 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-97) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:45,491 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-45) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:45,491 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-27) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:54,393 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-82) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:30:54,404 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-7) Failed to decrypt Data must start with zero 2012-11-04 15:31:55,878 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-71) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:31:55,891 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-40) Failed to decrypt Data must start with zero 2012-11-04 15:32:28,785 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-85) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:32:28,796 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-96) Failed to decrypt Data must start with zero 2012-11-04 15:32:36,401 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-28) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:32:48,333 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-80) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:32:48,333 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-21) Failed to decrypt Data must not be longer than 256 bytes 2012-11-04 15:33:01,633 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-76) [53942145] XML RPC error in command ConnectStoragePoolVDS ( HostName = puma35 ), the error was: java.util.concurrent.TimeoutException, TimeoutException: 2012-11-04 15:33:01,633 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-76) [53942145] FINISH, ConnectStoragePoolVDSCommand, log id: 47311fdb 2012-11-04 15:33:01,633 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-76) [53942145] Could not connect host puma35 to pool iscsi_dc 2012-11-04 15:33:01,676 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-76) [278152df] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 8107e470-2663-11e2-9efd-441ea17336ee Type: VDS 2012-11-04 15:33:01,689 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-76) [278152df] START, SetVdsStatusVDSCommand(HostName = puma35, HostId = 8107e470-2663-11e2-9efd-441ea17336ee, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 5e7b01cd 2012-11-04 15:33:01,702 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-76) [278152df] FINISH, SetVdsStatusVDSCommand, log id: 5e7b01cd 2012-11-04 15:33:01,714 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (QuartzScheduler_Worker-76) [278152df] Try to add duplicate values with same name. Type: VDS_SET_NONOPERATIONAL_DOMAIN. Value: storagepoolname 2012-11-04 15:33:01,761 INFO [org.ovirt.engine.core.bll.MultipleActionsRunner] (pool-4-thread-578) [278152df] MultipleActionsRunner of type MigrateVmToServer invoked with no actions 2012-11-04 15:33:01,787 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-76) [19ca30fb] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: 8107e470-2663-11e2-9efd-441ea17336ee Type: VDS 2012-11-04 15:33:01,806 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-76) [532e3704] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 8107e470-2663-11e2-9efd-441ea17336ee Type: VDS 2012-11-04 15:33:01,808 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-76) [532e3704] Host 8107e470-2663-11e2-9efd-441ea17336ee : puma35 is already in NonOperational status. SetNonOperationalVds command is skipped. 2012-11-04 15:33:19,101 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-70) Failed to decrypt Data must start with zero 2012-11-04 15:33:19,101 ERROR [org.ovirt.engine.core.engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-90) Failed to decrypt Data must start with zero
Created attachment 638062 [details] new_engine_log
These is works as design: 4) Host status will be change to 'Unasigned' then switch back to 'Non-Operational'. 5) Steps will repeat every 5 minutes. During activation the first status of host is Unasigned, if we failed to Activate host it will be moved to Non-Operational. Agreed that these annoying, but these is flow. Host was not moved to Up, it means no one can use it or run VM. Moving back to QA.
Verified- RHEVM Build SI23: In case host having a storage issue the following behavior will take place: 1) After about 5-6 minutes the Host will be switch to Non-Operational. 2) After 5 more minutes HostAutoRecovery will attempt to activate the host. Event message will display: a)"Detected new Host puma35. Host state was set to Up." b) Host status will be change to 'Unasigned'. 3 )After 2-3 minutes host will switch back to Non-Operational. Event message will display: a)"Failed to connect Host puma35 to Storage Pool iscsi_dc" b)"Host puma35 cannot access one of the Storage Domains attached to the Data" 4) Steps (1-3) repeat every 5 minutes (untill storage issue is solved).
Moving back to ON_QA for verification. Omri, per comment #10 it looks like it was already VERIFIED, please move the BZ to it's write status.