Bug 1378027

Summary: Engine disconnections from host
Product: Red Hat Enterprise Virtualization Manager Reporter: guy chen <guchen>
Component: ovirt-engineAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED NOTABUG QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.9CC: gklein, guchen, lsurette, michal.skrivanek, mperina, nsoffer, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-18 09:40:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description guy chen 2016-09-21 11:00:38 UTC
Description of problem:

There are disconnections from engine to host every hour, on the engine.log file there is "No route to host" error during UploadStreamVDSCommand and the host become unavailable and after few seconds is available again  

How reproducible:

High

Steps to Reproduce:

1.installed RHV 3.6.9
2.created 52 VMS on the host
3.start 52 VMS on the host

Actual results:

disconnections from engine to host

Expected results:

no disconnections from engine to host at all

Additional info:

additional investigations that where done :

Regarding the TCP timeout - doesn't looks to be the root cause due to the following investigation :
The TCP timeout is 2 hours and 11.25 minutes, same as in other RHV 4.0 system i have checked that doesn't have this issue, and the time difference between the failed calls is 1 hour apart :
-=>>cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
(root@b02-h23-r620) - (20:20) - (~)
-=>>cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
(root@b02-h23-r620) - (20:20) - (~)
-=>>cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
I have tripled the timeout to 21600 to see if the time difference between the failed call will change - still stayed the same, 1 hour.
The TCP number of connections doesn't show any drop during these times

I have confirmed that ping from Engine to host works fine during the UploadStreamVDSCommand schedule


There where successful UploadStreamVDSCommand calls from yesterday, but as for today they all fail, the change was that today i have created and started 52 VMS, while yesterday there was 1, the host is the same.

Comment 4 Michal Skrivanek 2016-09-22 05:08:39 UTC
sounds as a communication protocol ossues, at least as an initial theory to start investigation

Comment 5 Piotr Kliczewski 2016-09-22 12:23:17 UTC
UploadStreamVDSCommand uses http to upload a stream and it attempts to connect vdsm which fails with:

java.net.NoRouteToHostException: No route to host

This command failure causes storage domain to change status to unknown.

Nir, can you please take a look whether you can find a reason why this command is failing.

I can see in the logs that around the same time mom is able to call the vdsm using xmlrpc which uses the same infra code as upload stream.

Comment 6 Nir Soffer 2016-09-22 14:34:01 UTC
(In reply to Piotr Kliczewski from comment #5)
> UploadStreamVDSCommand uses http to upload a stream and it attempts to
> connect vdsm which fails with:
> 
> java.net.NoRouteToHostException: No route to host
...
> Nir, can you please take a look whether you can find a reason why this
> command is failing.

Looks like the host is not reachable when this is running (No route to host).
This is not storage issue, and does not look like a host issue.

Comment 7 Piotr Kliczewski 2016-09-22 14:44:32 UTC
Please provide tcp dump or log from wireshark when the issue occurs.

Comment 10 Piotr Kliczewski 2016-10-05 15:20:49 UTC
I checked tcp dump and I see that there is SYN frane sent and as response we get ICMP with "Code: 10 (Host administratively prohibited)"

When listing iptable rules I see:
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:54322 ctstate NEW

but in the engine log I see:

Initialize vdsBroker '<hostname>:54321'

Was the iptables reconfigured between initial connection and UploadStreamVDSCommand being called.

Please make sure that iptables are configured properly and rerun the test.

Comment 11 Piotr Kliczewski 2016-10-18 09:40:30 UTC
Due to lack of information I am closing this issue. Please reopen when you see anyother issue after retesting.

Comment 12 guy chen 2016-11-14 13:56:33 UTC
Since was not reproduced, currently refereed as a lab issue not a real bug.