Bug 1561522 - Activated host is marked as connecting because of unreachable storage domain.
Summary: Activated host is marked as connecting because of unreachable storage domain.
Keywords:
Status: CLOSED DUPLICATE of bug 1580243
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.3.0
: ---
Assignee: Nobody
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-28 13:46 UTC by Roman Hodain
Modified: 2020-08-03 15:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-19 00:47:20 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3208571 0 None None None 2018-08-13 08:57:19 UTC

Description Roman Hodain 2018-03-28 13:46:21 UTC
Description of problem:
When activating a hypervisor with more than one NFS storage domain configurated for the datacenter and the connection to the storage is broker than the host is flipping between "Up" and "not responding". That causes the WebAdmin to show the host as connecting for a long time. This is a problem for automation tasks. For example, activating the host via ansible roles fails and it is almost impossible to properly handle the error messages as the host cannot be activated or put into the maintenance mode. 

Version-Release number of selected component (if applicable):
rhvm-4.2.1.6-0.1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1) deploy one host 
2) Create two NFS SDs
3) Deactivate the host 
4) Block the communication on the NFS share 
    iptables -I INPUT -s ${Hpypervisor_IP} -j REJECT
4) Activate teh host

Actual results:
The host stays in the connecting state for a long time.

Expected results:
The host is marked as non-operational after a defined amount of time.

Additional info:
The engine sends repetitively ConnectStorageServerVDSCommand which returns VDSNetworkException as it takes a long time. Rhe issue does not occur when only one storage domain is connected. 

This is also happening on previous versions.

Comment 1 Yaniv Lavi 2018-08-13 08:52:20 UTC
The regular timeout for NFS mount is 70 seconds.
Is this the time the host is stuck in connecting?

Comment 2 Yaniv Lavi 2018-08-13 08:57:20 UTC
*** Bug 1580243 has been marked as a duplicate of this bug. ***

Comment 3 Germano Veit Michel 2018-08-14 00:12:23 UTC
(In reply to Yaniv Lavi from comment #1)
> The regular timeout for NFS mount is 70 seconds.
> Is this the time the host is stuck in connecting?

Hi Yaniv,

I think the other bug, which you set as duplicate, had a bit more info and some discussions already done.

Anyway, this is not just NFS and not only caused by NFS timeout mount. ConnectStorageServerVDSCommand can take longer, due to network(TCP)/storage/... delays. If this happens, the engine throws VDSNetworkException and tries again. And again, and again, in a loop. The host is always on connecting -> not responding -> connecting dance.

The correct status would be Non-Operational, not the Connecting->NotResponding dance. Or as Nir suggested on the other bug, this could be async.

Comment 5 Germano Veit Michel 2018-12-19 00:47:20 UTC

*** This bug has been marked as a duplicate of bug 1580243 ***


Note You need to log in before you can comment on or make changes to this bug.