Bug 1392996

Summary: Add new host action fails and makes the host non-responsive
Product: [oVirt] ovirt-engine Reporter: Mor <mkalfon>
Component: BLL.InfraAssignee: Oved Ourfali <oourfali>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matyáš <pmatyas>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bugs, danken, gklein, lsvaty, mburman, mperina, pkliczew
Target Milestone: ovirt-4.1.0-alphaKeywords: Automation, AutomationBlocker, Regression
Target Release: ---Flags: rule-engine: ovirt-4.1+
gklein: blocker+
rule-engine: planning_ack+
oourfali: devel_ack+
lsvaty: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-15 15:01:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
host vdsm log
none
engine supervdsm vdsm logs none

Description Mor 2016-11-08 16:21:15 UTC
Description of problem:
When installing a new host or reinstalling existing one, the installation completes but the host becomes "non-responsive".

Version-Release number of selected component (if applicable):
oVirt: 4.1.0-0.0.master.20161107231322.git50adf28.el7.centos
Vdsm: vdsm-4.18.999-798.git7a306d6.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Add / Reinstall new host
2. Verify sure all host components are install with the recent oVirt snapshot
3. Host becomes non-responsive and communication errors appear in the log

Actual results:
Host becomes non-responsive

Expected results:
Host should be added to the engine and become responsive

Additional info:

2016-11-08 18:05:03,319 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM h
ost_mixed_2 command failed: Message timeout which can be caused by communication issues
2016-11-08 18:05:03,319 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand'
 return value 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@6d559e70'
2016-11-08 18:05:03,320 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [] HostName = host_mixed_2
2016-11-08 18:05:03,320 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [] Command 'GetCapabilitiesVDSCommand(HostName = host_mixed_2, VdsIdAndVdsVDSCom
mandParametersBase:{runAsync='true', hostId='e8b8cc5c-d510-4f84-b6a1-ec2653d43b81', vds='Host[host_mixed_2,e8b8cc5c-d510-4f84-b6a1-ec2653d43b81]'})' execution failed: VDSGenericException: VDSNetworkExceptio
n: Message timeout which can be caused by communication issues
2016-11-08 18:05:03,320 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler1) [] Failure to refresh host 'host_mixed_2' runtime info: VDSGenericException: VDSNetworkExc
eption: Message timeout which can be caused by communication issues
2016-11-08 18:05:06,326 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.35.128.28
2016-11-08 18:05:07,412 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during message processing
2016-11-08 18:05:09,075 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during message processing
2016-11-08 18:05:18,320 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during message processing
2016-11-08 18:05:20,322 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during message processing
2016-11-08 18:05:41,350 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [] Command 'GetAllVmStatsVDSCommand(HostName = host_mixed_1, VdsIdVDSCommandParame
tersBase:{runAsync='true', hostId='f3dadec8-5b39-429d-8050-1e03bf37f7d2'})' execution failed: VDSGenericException: VDSNetworkException: Vds timeout occured

Comment 1 Red Hat Bugzilla Rules Engine 2016-11-08 16:21:20 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 2 Mor 2016-11-08 16:26:03 UTC
Created attachment 1218610 [details]
host vdsm log

Comment 3 Mor 2016-11-08 16:27:41 UTC
Created attachment 1218611 [details]
engine supervdsm vdsm logs

Comment 4 Oved Ourfali 2016-11-08 19:14:26 UTC
Piotr, can you check if it is with your recent change or not? I wasn't sure.

Comment 5 Mor 2016-11-14 17:11:35 UTC
Problem fixed: 
Version 4.1.0-0.0.master.20161112231308.git672bd31.el7.centos

Comment 6 Sandro Bonazzola 2016-12-12 14:00:33 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 7 Mor 2016-12-13 12:59:57 UTC
Hi,
I verified it on: 
oVirt Engine Version: 4.1.0-0.2.master.20161212071212.git8a015dd.el7.centos