Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1109544

Summary: host fail to become "UP", from maintenance, following VMs migration.
Product: Red Hat Enterprise Virtualization Manager Reporter: Ilanit Stein <istein>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED NOTABUG QA Contact: Ilanit Stein <istein>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acathrow, gklein, iheim, lpeer, oourfali, Rhev-m-bugs, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-17 10:43:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
host1 event log
none
engine log
none
host2 vdsm log
none
host 2 libvirt log
none
host1 vdsm log
none
host1 libvirt log none

Description Ilanit Stein 2014-06-15 07:02:16 UTC
Description of problem:
host1(rhev-h) run 5 VMs. Turn host1 to maintenance. All VMs migrated to host2 (rhel6.5). Activate host1 - fail to become "UP":

Fail to access storage domain (iscsi), cannot access storage pool, 
and "Available memory of host1 [0 MB] is under defined threshold [1024 MB]" single event.
Eventually after ~1 hour, host was removed, and reinstalled, after network error, and recovery from crash, it became up, shortly after installed.
 

Version-Release number of selected component (if applicable):
engine: av9.4
host1 rhev-h 6.5 20140603.2.el6ev
host2 rhel6.5)
vdsm, libvirt for both hosts:
vdsm-4.14.7-3.el6ev.x86_64, 
libvirt-0.10.2-29.el6_5.8.x86_64

How reproducible:
Did not try to reproduce

Actual results:
host1 should have turned up. There should not have been error on access to storage domain, or warning on host available memory 0

Additional info:
Each VM has 1024M memory, installed with rhel6.5 + guest agent.

There is automatic test, which passes: 2 rhel hosts, with 5 VMs running. Put host1 to maintenance, and after migration, put host 1 back up, and turn host2 to maintenance.

Comment 1 Ilanit Stein 2014-06-15 07:06:22 UTC
Created attachment 908886 [details]
host1 event log

Comment 2 Ilanit Stein 2014-06-15 07:19:58 UTC
Created attachment 908887 [details]
engine log

Comment 3 Ilanit Stein 2014-06-15 07:39:59 UTC
Created attachment 908889 [details]
host2 vdsm log

Comment 4 Ilanit Stein 2014-06-15 07:40:33 UTC
Created attachment 908890 [details]
host 2 libvirt log

Comment 5 Ilanit Stein 2014-06-15 07:48:25 UTC
Created attachment 908891 [details]
host1 vdsm log

Comment 6 Ilanit Stein 2014-06-15 07:48:57 UTC
Created attachment 908892 [details]
host1 libvirt log

Comment 8 Ilanit Stein 2014-06-15 08:55:56 UTC
Problem did not reproduce:

Run 5 VMs on host1,
Move host1 to maintenance.
error event: Failed to switch to maintenance,
But right after this error, events for all VMs migrations to host2 were completed,
and host1 became in maintenance.
Then, activate host1 worked fine.

Comment 9 Ilanit Stein 2014-06-17 10:43:50 UTC
QE storage guys investigation showed the problem to activate host, occurred because the hosts contained many old storage connections, that made the storage domain connection very long, more than 3 min, which is the default timeout configured in /etc/multipath.conf.
The host connection to storage succeeded eventually, but as timeout expired, engine considered it as a failure.

As this is not a bug, but a matter of configuration, and "slow" host, I am closing the bug.