Bug 1571768

Summary: Connections shouldn't be closed after the connection to the host was recovered
Product: [oVirt] vdsm-jsonrpc-java Reporter: Alona Kaplan <alkaplan>
Component: CoreAssignee: Ravi Nori <rnori>
Status: CLOSED CURRENTRELEASE QA Contact: Pavol Brilla <pbrilla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.3.16CC: bugs, lsvaty, mperina, rnori
Target Milestone: ovirt-4.2.4Flags: rule-engine: ovirt-4.2+
lsvaty: testing_ack+
Target Release: 1.4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-jsonrpc-java-1.4.13 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-26 08:38:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log
none
vdsm.log
none
engine2.log
none
vdsm2.log none

Description Alona Kaplan 2018-04-25 11:58:33 UTC
Created attachment 1426608 [details]
engine.log

Description of problem:

All the open requests to the host are terminated once one of the requests gets a timeout, even if the connectivity to the host was already restored.

The scenario I debugged-

1. Stop the vdsm
2. While the vdsm is down GetAllVmStatsVDSCommand is sent (several times) and gets a ConnectException. The request is registered to the JsonRpcClient.tracker.
3. Start the vdsm.
4. Run several SetupNetworks commands.
3 minutes after GetAllVmStatsVDSCommand failed, ResponseTracker.loop starts the timeout treatment.
As part of the treatment, all the open requests to the host are terminated.
If a SN is currently running it is terminated.

BTW, the issue was partially solved for async vds commands in patch https://gerrit.ovirt.org/#/c/90189
In case of an immediate ConnectException the request is not registered to the JsonRpcClient.tracker.

Version-Release number of selected component (if applicable):


How reproducible:
20%

Steps to Reproduce:
1. Stop the vdsm
2. Wait for the host to become non-responsive
3. Start the vdsm
4. Wait for the host to become up.
5. Run several setup networks (or other vds commands) one after the other for 3 minutes.

Actual results:
~3 minutes after stopping the vdsm, all the vds commands are terminated although the host is up.

Expected results:
The vds commands should finish successfully.

Additional info:

Comment 1 Alona Kaplan 2018-04-25 12:00:54 UTC
Created attachment 1426609 [details]
vdsm.log

Comment 2 Alona Kaplan 2018-04-25 12:03:04 UTC
Created attachment 1426610 [details]
engine2.log

Comment 3 Alona Kaplan 2018-04-25 12:03:59 UTC
Created attachment 1426611 [details]
vdsm2.log

Comment 4 Martin Perina 2018-04-25 14:50:08 UTC
This is not nothing new, the way how connections are closed within vdsm-jsonrpc-java exists from beginning and the issue it causes is not related to the nonblocking thread changes in oVirt 4.2. Also this change it's quite dangerous and we really need to verify all possible regressions, so moving to 4.2.4

Comment 5 Pavol Brilla 2018-06-06 12:08:02 UTC
Could you please suggest verification steps?

Comment 6 Martin Perina 2018-06-06 12:23:59 UTC
I don't think we have anything other than what's mentioned in Description, right Ravi?

Comment 7 Ravi Nori 2018-06-06 13:05:57 UTC
There are no specific steps except for the ones in Description. You should not see any errors in logs regarding SetupNetworks

Comment 8 Pavol Brilla 2018-06-25 19:48:57 UTC
3 of 3 - no termination catched

Comment 9 Sandro Bonazzola 2018-06-26 08:38:34 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.