Bug 991729

Summary: [engine-backend] host cannot be activated after it had been updated to maintenance in DB, while engine has never got the response for DisconnectStoragePool
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: Tareq Alayan <talayan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: acathrow, bazulay, iheim, lpeer, pstehlik, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---Keywords: Regression
Target Release: 3.3.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: is13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 994608 (view as bug list) Environment:
Last Closed: 2014-01-21 22:18:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 994608    
Attachments:
Description Flags
logs none

Description Elad 2013-08-04 06:28:38 UTC
Created attachment 782415 [details]
logs

Description of problem:
Engine cannot activate host when it had been updated as Maintenance status in DB and engine has never got a response to DisconnectStoragePool request. 

Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.11.master.el6ev.noarch
vdsm-4.12.0-rc3.12.git139ec2f.el6ev.x86_64


How reproducible:
100%

Steps to Reproduce:
on 2 host cluster and active storage pool:
- set SPM to maintenance
- block connectivity between host to RHEVM with iptables right after engine set host to maintenance in DB


Actual results:

engine sets host to maintenance in DB:

2013-08-03 17:28:03,299 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-6) Updated vds status from Preparing for Maintenance to Maint
enance in database,  vds = 223b05cc-4797-4a4f-9f2a-c4be0fa232eb : nott-vds2


DisconnectStoragePools is requested:

2013-08-03 17:28:03,307 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (DefaultQuartzScheduler_Worker-6) START, DisconnectStoragePoolVDSComman
d(HostName = nott-vds2, HostId = 223b05cc-4797-4a4f-9f2a-c4be0fa232eb, storagePoolId = aa047779-f7a9-4888-bd9c-fcf9d2f76e7e, vds_spm_id = 1), log id: 6ea687fd
2013-08-03 17:31:03,308 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (DefaultQuartzScheduler_Worker-6) Command DisconnectStoragePoolVDS exec
ution failed. Exception: VDSNetworkException: java.util.concurrent.TimeoutException
2013-08-03 17:31:03,308 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (DefaultQuartzScheduler_Worker-6) FINISH, DisconnectStoragePoolVDSComma
nd, log id: 6ea687fd


engine reports a problem with DisconnectStoragePools:

2013-08-03 17:31:03,333 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-6) Host encounter a problem moving to maintenance mode, proba
bly error during disconnecting it from pool org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
java.util.concurrent.TimeoutException (Failed with VDSM error VDS_NETWORK_ERROR and code 5022). The Host will stay in Maintenance

engine fails to activate host:

2013-08-03 17:32:02,152 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (pool-5-thread-42) [5bdce1af] START, ActivateVdsVDSCommand(HostName = nott-vds2, HostId = 223b05cc-4797-4a4f-9f2a-c4be0fa232eb), log id: 5afe05a3
2013-08-03 17:32:02,152 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (pool-5-thread-42) [5bdce1af] Failed to activate VDS = 223b05cc-4797-4a4f-9f2a-c4be0fa232eb with error: null.


engine fails to set host to maintenance because it is already updated as maintenance in DB:

2013-08-03 17:35:19,675 WARN  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (ajp-/127.0.0.1:8702-10) [7052eb92] CanDoAction of action MaintenanceNumberOfVdss failed
. Reasons:VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_VDS_IS_IN_MAINTENANCE




Additional info: logs

Comment 2 Elad 2013-08-04 12:54:53 UTC
Host is stuck in 'Unassigned' state. There is nothing user can do in order to activate/remove the host

Comment 4 Tareq Alayan 2013-09-03 11:57:41 UTC
verified in is12. 
Host back to up again.

Comment 5 Itamar Heim 2014-01-21 22:18:30 UTC
Closing - RHEV 3.3 Released

Comment 6 Itamar Heim 2014-01-21 22:24:53 UTC
Closing - RHEV 3.3 Released