Bug 1378027 - Engine disconnections from host
Summary: Engine disconnections from host
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Piotr Kliczewski
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-21 11:00 UTC by guy chen
Modified: 2016-11-14 13:56 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-18 09:40:30 UTC
oVirt Team: Infra
Target Upstream Version:


Attachments (Terms of Use)

Description guy chen 2016-09-21 11:00:38 UTC
Description of problem:

There are disconnections from engine to host every hour, on the engine.log file there is "No route to host" error during UploadStreamVDSCommand and the host become unavailable and after few seconds is available again  

How reproducible:

High

Steps to Reproduce:

1.installed RHV 3.6.9
2.created 52 VMS on the host
3.start 52 VMS on the host

Actual results:

disconnections from engine to host

Expected results:

no disconnections from engine to host at all

Additional info:

additional investigations that where done :

Regarding the TCP timeout - doesn't looks to be the root cause due to the following investigation :
The TCP timeout is 2 hours and 11.25 minutes, same as in other RHV 4.0 system i have checked that doesn't have this issue, and the time difference between the failed calls is 1 hour apart :
-=>>cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
(root@b02-h23-r620) - (20:20) - (~)
-=>>cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
(root@b02-h23-r620) - (20:20) - (~)
-=>>cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
I have tripled the timeout to 21600 to see if the time difference between the failed call will change - still stayed the same, 1 hour.
The TCP number of connections doesn't show any drop during these times

I have confirmed that ping from Engine to host works fine during the UploadStreamVDSCommand schedule


There where successful UploadStreamVDSCommand calls from yesterday, but as for today they all fail, the change was that today i have created and started 52 VMS, while yesterday there was 1, the host is the same.

Comment 4 Michal Skrivanek 2016-09-22 05:08:39 UTC
sounds as a communication protocol ossues, at least as an initial theory to start investigation

Comment 5 Piotr Kliczewski 2016-09-22 12:23:17 UTC
UploadStreamVDSCommand uses http to upload a stream and it attempts to connect vdsm which fails with:

java.net.NoRouteToHostException: No route to host

This command failure causes storage domain to change status to unknown.

Nir, can you please take a look whether you can find a reason why this command is failing.

I can see in the logs that around the same time mom is able to call the vdsm using xmlrpc which uses the same infra code as upload stream.

Comment 6 Nir Soffer 2016-09-22 14:34:01 UTC
(In reply to Piotr Kliczewski from comment #5)
> UploadStreamVDSCommand uses http to upload a stream and it attempts to
> connect vdsm which fails with:
> 
> java.net.NoRouteToHostException: No route to host
...
> Nir, can you please take a look whether you can find a reason why this
> command is failing.

Looks like the host is not reachable when this is running (No route to host).
This is not storage issue, and does not look like a host issue.

Comment 7 Piotr Kliczewski 2016-09-22 14:44:32 UTC
Please provide tcp dump or log from wireshark when the issue occurs.

Comment 10 Piotr Kliczewski 2016-10-05 15:20:49 UTC
I checked tcp dump and I see that there is SYN frane sent and as response we get ICMP with "Code: 10 (Host administratively prohibited)"

When listing iptable rules I see:
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:54322 ctstate NEW

but in the engine log I see:

Initialize vdsBroker '<hostname>:54321'

Was the iptables reconfigured between initial connection and UploadStreamVDSCommand being called.

Please make sure that iptables are configured properly and rerun the test.

Comment 11 Piotr Kliczewski 2016-10-18 09:40:30 UTC
Due to lack of information I am closing this issue. Please reopen when you see anyother issue after retesting.

Comment 12 guy chen 2016-11-14 13:56:33 UTC
Since was not reproduced, currently refereed as a lab issue not a real bug.


Note You need to log in before you can comment on or make changes to this bug.