Bug 1571768 - Connections shouldn't be closed after the connection to the host was recovered
Summary: Connections shouldn't be closed after the connection to the host was recovered
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm-jsonrpc-java
Classification: oVirt
Component: Core
Version: 1.3.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.4
: 1.4.13
Assignee: Ravi Nori
QA Contact: Pavol Brilla
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-25 11:58 UTC by Alona Kaplan
Modified: 2018-06-26 08:38 UTC (History)
4 users (show)

Fixed In Version: vdsm-jsonrpc-java-1.4.13
Clone Of:
Environment:
Last Closed: 2018-06-26 08:38:34 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.2+
lsvaty: testing_ack+


Attachments (Terms of Use)
engine.log (14.92 MB, text/plain)
2018-04-25 11:58 UTC, Alona Kaplan
no flags Details
vdsm.log (1.79 MB, text/plain)
2018-04-25 12:00 UTC, Alona Kaplan
no flags Details
engine2.log (14.15 MB, text/plain)
2018-04-25 12:03 UTC, Alona Kaplan
no flags Details
vdsm2.log (4.41 MB, text/plain)
2018-04-25 12:03 UTC, Alona Kaplan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 90646 0 master MERGED Connections shouldn't be closed after the connection to the host was recovered 2018-04-27 08:16:06 UTC

Description Alona Kaplan 2018-04-25 11:58:33 UTC
Created attachment 1426608 [details]
engine.log

Description of problem:

All the open requests to the host are terminated once one of the requests gets a timeout, even if the connectivity to the host was already restored.

The scenario I debugged-

1. Stop the vdsm
2. While the vdsm is down GetAllVmStatsVDSCommand is sent (several times) and gets a ConnectException. The request is registered to the JsonRpcClient.tracker.
3. Start the vdsm.
4. Run several SetupNetworks commands.
3 minutes after GetAllVmStatsVDSCommand failed, ResponseTracker.loop starts the timeout treatment.
As part of the treatment, all the open requests to the host are terminated.
If a SN is currently running it is terminated.

BTW, the issue was partially solved for async vds commands in patch https://gerrit.ovirt.org/#/c/90189
In case of an immediate ConnectException the request is not registered to the JsonRpcClient.tracker.

Version-Release number of selected component (if applicable):


How reproducible:
20%

Steps to Reproduce:
1. Stop the vdsm
2. Wait for the host to become non-responsive
3. Start the vdsm
4. Wait for the host to become up.
5. Run several setup networks (or other vds commands) one after the other for 3 minutes.

Actual results:
~3 minutes after stopping the vdsm, all the vds commands are terminated although the host is up.

Expected results:
The vds commands should finish successfully.

Additional info:

Comment 1 Alona Kaplan 2018-04-25 12:00:54 UTC
Created attachment 1426609 [details]
vdsm.log

Comment 2 Alona Kaplan 2018-04-25 12:03:04 UTC
Created attachment 1426610 [details]
engine2.log

Comment 3 Alona Kaplan 2018-04-25 12:03:59 UTC
Created attachment 1426611 [details]
vdsm2.log

Comment 4 Martin Perina 2018-04-25 14:50:08 UTC
This is not nothing new, the way how connections are closed within vdsm-jsonrpc-java exists from beginning and the issue it causes is not related to the nonblocking thread changes in oVirt 4.2. Also this change it's quite dangerous and we really need to verify all possible regressions, so moving to 4.2.4

Comment 5 Pavol Brilla 2018-06-06 12:08:02 UTC
Could you please suggest verification steps?

Comment 6 Martin Perina 2018-06-06 12:23:59 UTC
I don't think we have anything other than what's mentioned in Description, right Ravi?

Comment 7 Ravi Nori 2018-06-06 13:05:57 UTC
There are no specific steps except for the ones in Description. You should not see any errors in logs regarding SetupNetworks

Comment 8 Pavol Brilla 2018-06-25 19:48:57 UTC
3 of 3 - no termination catched

Comment 9 Sandro Bonazzola 2018-06-26 08:38:34 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.