Bug 854960 - [engine-core] takes 10 minutes to change state of host to Non-responsive when connection is blocked (time-out)
Summary: [engine-core] takes 10 minutes to change state of host to Non-responsive when...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Barak
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-06 11:50 UTC by Gadi Ickowicz
Modified: 2016-02-10 19:44 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-01 10:08:38 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine logs (64.91 KB, application/x-gzip)
2012-09-06 11:50 UTC, Gadi Ickowicz
no flags Details

Description Gadi Ickowicz 2012-09-06 11:50:37 UTC
Created attachment 610281 [details]
engine logs

Description of problem:
When blocking the connection between engine and an HSM host using iptables (DROP), it takes about 10 minutes for the state of the host to change to non-responsive.

These 2 commands ran in quick succession:
[root@gadi-rhevm ~]# date
Thu Sep  6 13:56:39 IDT 2012
[root@gadi-rhevm ~]# iptables -A OUTPUT -d green-vdsa.qa.lab.tlv.redhat.com -j DROP

--------engine log lines:-------------------

2012-09-06 13:55:14,296 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-70) [37518ccb] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. E
ntities affected :  ID: ed2d2eb2-f7fb-11e1-a776-001a4a169705 Type: VDS
2012-09-06 13:55:14,318 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-70) [59ba097e] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: ed2d2eb2-f
7fb-11e1-a776-001a4a169705 Type: VDS
2012-09-06 13:59:42,136 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-95) XML RPC error in command ListVDS ( Vds: green-vdsa.qa.lab.tlv.redhat.com ), the error was: jav
a.util.concurrent.TimeoutException, TimeoutException:
2012-09-06 13:59:42,155 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-95) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : g
reen-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-06 14:00:00,000 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-75) Autorecovering hosts is disabled, skipping
2012-09-06 14:00:00,000 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-75) Autorecovering storage domains is disabled, skipping
2012-09-06 14:02:44,160 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-13) XML RPC error in command GetCapabilitiesVDS ( Vds: green-vdsa.qa.lab.tlv.redhat.com ), the err
or was: java.util.concurrent.TimeoutException, TimeoutException:
2012-09-06 14:02:44,160 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-13) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : g
reen-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-06 14:05:00,001 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-38) Autorecovering hosts is disabled, skipping
2012-09-06 14:05:00,001 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-38) Autorecovering storage domains is disabled, skipping
2012-09-06 14:05:46,165 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-81) XML RPC error in command GetCapabilitiesVDS ( Vds: green-vdsa.qa.lab.tlv.redhat.com ), the err
or was: java.util.concurrent.TimeoutException, TimeoutException:
2012-09-06 14:05:46,165 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-81) VDS::handleNetworkException Server failed to respond,  vds_id = ed2d2eb2-f7fb-11e1-a776-001a4a169705, vds_name = green-vdsa.qa.lab.tlv.redhat.com, error = VDSNetworkException:
2012-09-06 14:05:46,243 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-4-thread-46) ResourceManager::vdsNotResponding entered for Host ed2d2eb2-f7fb-11e1-a776-001a4a169705, 10.35.102.10
2012-09-06 14:05:46,283 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-4-thread-46) [3d51e8ba] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCING_DISABLED



Version-Release number of selected component (if applicable):
rhevm-3.1.0-15.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Block connection from engine to host in iptables (DROP)
2. wait for engine to move host to non-responsive state
  
Actual results:
It takes about 10 minutes for the state change

Expected results:
Shorter timeout is expected when connection to host is blocked/not working.

Additional info:

Comment 1 Yaniv Kaul 2012-09-06 13:51:57 UTC
Note that you've not 'blocked' the connection in the classic meaning - you are dropping packets, not rejecting. If you change the iptables command to REJECT, does it still take 10 minutes? Not that 10 mintues is OK anyway, but it will take a while, because drop means that TCP connections do re-transmit until they give up eventually.

Comment 2 Gadi Ickowicz 2012-09-09 06:31:47 UTC
(In reply to comment #1)
> Note that you've not 'blocked' the connection in the classic meaning - you
> are dropping packets, not rejecting. If you change the iptables command to
> REJECT, does it still take 10 minutes? Not that 10 mintues is OK anyway, but
> it will take a while, because drop means that TCP connections do re-transmit
> until they give up eventually.

Using REJECT with iptables takes 1 minute for the status to change:


[root@gadi-rhevm ~]# date
Sun Sep  9 09:20:34 IDT 2012
[root@gadi-rhevm ~]# iptables -A OUTPUT -d green-vdsa.qa.lab.tlv.redhat.com -j REJECT


012-09-09 09:21:23,602 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-2) [1af80802] ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : green-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-09 09:21:29,606 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-83) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ed2d2eb2-f7fb-11e1-a776-001a4a169705 : green-vdsa.qa.lab.tlv.redhat.com, VDS Network Error, continuing.
VDSNetworkException:
2012-09-09 09:21:35,612 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-37) VDS::handleNetworkException Server failed to respond,  vds_id = ed2d2eb2-f7fb-11e1-a776-001a4a169705, vds_name = green-vdsa.qa.lab.tlv.redhat.com, error = VDSNetworkException:
2012-09-09 09:21:35,641 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-4-thread-47) ResourceManager::vdsNotResponding entered for Host ed2d2eb2-f7fb-11e1-a776-001a4a169705, 10.35.102.10
2012-09-09 09:21:35,685 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-4-thread-47) [7e71d8f9] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCING_DISABLED

Comment 4 Itamar Heim 2013-12-01 10:08:38 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.

in 3.3 there were some changes to have more predictable timeouts - see bug 863211


Note You need to log in before you can comment on or make changes to this bug.