Bug 854960 - [engine-core] takes 10 minutes to change state of host to Non-responsive when connection is blocked (time-out)
[engine-core] takes 10 minutes to change state of host to Non-responsive when...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
All Linux
unspecified Severity high
: ---
: ---
Assigned To: Barak
Pavel Stehlik
infra
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-06 07:50 EDT by Gadi Ickowicz
Modified: 2016-02-10 14:44 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-01 05:08:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine logs (64.91 KB, application/x-gzip)
2012-09-06 07:50 EDT, Gadi Ickowicz
no flags Details

  None (edit)
Description Gadi Ickowicz 2012-09-06 07:50:37 EDT
Created attachment 610281 [details]
engine logs

Description of problem:
When blocking the connection between engine and an HSM host using iptables (DROP), it takes about 10 minutes for the state of the host to change to non-responsive.

These 2 commands ran in quick succession:
[root@gadi-rhevm ~]# date
Thu Sep  6 13:56:39 IDT 2012
[root@gadi-rhevm ~]# iptables -A OUTPUT -d green-vdsa.qa.lab.tlv.redhat.com -j DROP

--------engine log lines:-------------------

2012-09-06 13:55:14,296 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-70) [37518ccb] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. E
ntities affected :  ID: ed2d2eb2-f7fb-11e1-a776-001a4a169705 Type: VDS
2012-09-06 13:55:14,318 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-70) [59ba097e] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: ed2d2eb2-f
7fb-11e1-a776-001a4a169705 Type: VDS
2012-09-06 13:59:42,136 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-95) XML RPC error in command ListVDS ( Vds: green-vdsa.qa.lab.tlv.redhat.com ), the error was: jav
a.util.concurrent.TimeoutException, TimeoutException:
2012-09-06 13:59:42,155 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-95) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : g
reen-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-06 14:00:00,000 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-75) Autorecovering hosts is disabled, skipping
2012-09-06 14:00:00,000 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-75) Autorecovering storage domains is disabled, skipping
2012-09-06 14:02:44,160 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-13) XML RPC error in command GetCapabilitiesVDS ( Vds: green-vdsa.qa.lab.tlv.redhat.com ), the err
or was: java.util.concurrent.TimeoutException, TimeoutException:
2012-09-06 14:02:44,160 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-13) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : g
reen-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-06 14:05:00,001 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-38) Autorecovering hosts is disabled, skipping
2012-09-06 14:05:00,001 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-38) Autorecovering storage domains is disabled, skipping
2012-09-06 14:05:46,165 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-81) XML RPC error in command GetCapabilitiesVDS ( Vds: green-vdsa.qa.lab.tlv.redhat.com ), the err
or was: java.util.concurrent.TimeoutException, TimeoutException:
2012-09-06 14:05:46,165 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-81) VDS::handleNetworkException Server failed to respond,  vds_id = ed2d2eb2-f7fb-11e1-a776-001a4a169705, vds_name = green-vdsa.qa.lab.tlv.redhat.com, error = VDSNetworkException:
2012-09-06 14:05:46,243 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-4-thread-46) ResourceManager::vdsNotResponding entered for Host ed2d2eb2-f7fb-11e1-a776-001a4a169705, 10.35.102.10
2012-09-06 14:05:46,283 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-4-thread-46) [3d51e8ba] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCING_DISABLED



Version-Release number of selected component (if applicable):
rhevm-3.1.0-15.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Block connection from engine to host in iptables (DROP)
2. wait for engine to move host to non-responsive state
  
Actual results:
It takes about 10 minutes for the state change

Expected results:
Shorter timeout is expected when connection to host is blocked/not working.

Additional info:
Comment 1 Yaniv Kaul 2012-09-06 09:51:57 EDT
Note that you've not 'blocked' the connection in the classic meaning - you are dropping packets, not rejecting. If you change the iptables command to REJECT, does it still take 10 minutes? Not that 10 mintues is OK anyway, but it will take a while, because drop means that TCP connections do re-transmit until they give up eventually.
Comment 2 Gadi Ickowicz 2012-09-09 02:31:47 EDT
(In reply to comment #1)
> Note that you've not 'blocked' the connection in the classic meaning - you
> are dropping packets, not rejecting. If you change the iptables command to
> REJECT, does it still take 10 minutes? Not that 10 mintues is OK anyway, but
> it will take a while, because drop means that TCP connections do re-transmit
> until they give up eventually.

Using REJECT with iptables takes 1 minute for the status to change:


[root@gadi-rhevm ~]# date
Sun Sep  9 09:20:34 IDT 2012
[root@gadi-rhevm ~]# iptables -A OUTPUT -d green-vdsa.qa.lab.tlv.redhat.com -j REJECT


012-09-09 09:21:23,602 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-2) [1af80802] ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : green-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-09 09:21:29,606 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-83) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : green-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-09 09:21:35,612 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-37) VDS::handleNetworkException Server failed to respond,  vds_id = ed2d2eb2-f7fb-11e1-a776-001a4a169705, vds_name = green-vdsa.qa.lab.tlv.redhat.com, error = VDSNetworkException:
2012-09-09 09:21:35,641 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-4-thread-47) ResourceManager::vdsNotResponding entered for Host ed2d2eb2-f7fb-11e1-a776-001a4a169705, 10.35.102.10
2012-09-09 09:21:35,685 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-4-thread-47) [7e71d8f9] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCING_DISABLED
Comment 4 Itamar Heim 2013-12-01 05:08:38 EST
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.

in 3.3 there were some changes to have more predictable timeouts - see bug 863211

Note You need to log in before you can comment on or make changes to this bug.