Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be unavailable on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1571768 - Connections shouldn't be closed after the connection to the host was recovered
Summary: Connections shouldn't be closed after the connection to the host was recovered
Alias: None
Product: vdsm-jsonrpc-java
Classification: oVirt
Component: Core
Version: 1.3.16
Hardware: Unspecified
OS: Unspecified
Target Milestone: ovirt-4.2.4
: 1.4.13
Assignee: Ravi Nori
QA Contact: Pavol Brilla
Depends On:
TreeView+ depends on / blocked
Reported: 2018-04-25 11:58 UTC by Alona Kaplan
Modified: 2018-06-26 08:38 UTC (History)
4 users (show)

Fixed In Version: vdsm-jsonrpc-java-1.4.13
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2018-06-26 08:38:34 UTC
oVirt Team: Infra
rule-engine: ovirt-4.2+
lsvaty: testing_ack+

Attachments (Terms of Use)
engine.log (14.92 MB, text/plain)
2018-04-25 11:58 UTC, Alona Kaplan
no flags Details
vdsm.log (1.79 MB, text/plain)
2018-04-25 12:00 UTC, Alona Kaplan
no flags Details
engine2.log (14.15 MB, text/plain)
2018-04-25 12:03 UTC, Alona Kaplan
no flags Details
vdsm2.log (4.41 MB, text/plain)
2018-04-25 12:03 UTC, Alona Kaplan
no flags Details

System ID Private Priority Status Summary Last Updated
oVirt gerrit 90646 0 master MERGED Connections shouldn't be closed after the connection to the host was recovered 2018-04-27 08:16:06 UTC

Description Alona Kaplan 2018-04-25 11:58:33 UTC
Created attachment 1426608 [details]

Description of problem:

All the open requests to the host are terminated once one of the requests gets a timeout, even if the connectivity to the host was already restored.

The scenario I debugged-

1. Stop the vdsm
2. While the vdsm is down GetAllVmStatsVDSCommand is sent (several times) and gets a ConnectException. The request is registered to the JsonRpcClient.tracker.
3. Start the vdsm.
4. Run several SetupNetworks commands.
3 minutes after GetAllVmStatsVDSCommand failed, ResponseTracker.loop starts the timeout treatment.
As part of the treatment, all the open requests to the host are terminated.
If a SN is currently running it is terminated.

BTW, the issue was partially solved for async vds commands in patch https://gerrit.ovirt.org/#/c/90189
In case of an immediate ConnectException the request is not registered to the JsonRpcClient.tracker.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Stop the vdsm
2. Wait for the host to become non-responsive
3. Start the vdsm
4. Wait for the host to become up.
5. Run several setup networks (or other vds commands) one after the other for 3 minutes.

Actual results:
~3 minutes after stopping the vdsm, all the vds commands are terminated although the host is up.

Expected results:
The vds commands should finish successfully.

Additional info:

Comment 1 Alona Kaplan 2018-04-25 12:00:54 UTC
Created attachment 1426609 [details]

Comment 2 Alona Kaplan 2018-04-25 12:03:04 UTC
Created attachment 1426610 [details]

Comment 3 Alona Kaplan 2018-04-25 12:03:59 UTC
Created attachment 1426611 [details]

Comment 4 Martin Perina 2018-04-25 14:50:08 UTC
This is not nothing new, the way how connections are closed within vdsm-jsonrpc-java exists from beginning and the issue it causes is not related to the nonblocking thread changes in oVirt 4.2. Also this change it's quite dangerous and we really need to verify all possible regressions, so moving to 4.2.4

Comment 5 Pavol Brilla 2018-06-06 12:08:02 UTC
Could you please suggest verification steps?

Comment 6 Martin Perina 2018-06-06 12:23:59 UTC
I don't think we have anything other than what's mentioned in Description, right Ravi?

Comment 7 Ravi Nori 2018-06-06 13:05:57 UTC
There are no specific steps except for the ones in Description. You should not see any errors in logs regarding SetupNetworks

Comment 8 Pavol Brilla 2018-06-25 19:48:57 UTC
3 of 3 - no termination catched

Comment 9 Sandro Bonazzola 2018-06-26 08:38:34 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.