Description of problem: when a a client is trying connect and can't we will try to busy wait till the connection timeout while (!this.channel.finishConnect()) { final long timeout = getTimeout(policy.getRetryTimeOut(), policy.getTimeUnit()); final FutureTask<SocketChannel> connectTask = scheduleTask(new Retryable<>(() -> { if (System.currentTimeMillis() >= timeout) { throw new ConnectException("Connection timeout"); } return null; }, this.policy)); connectTask.get(); } this will occupy the Reactor thread for and will cause it to continuously insert new tasks to check the connection without any backoff strategy. it will create tons of objects that will go to waste as well Also the timeout handling is also broken as it is always gets increamented inside the while loop. So the loop never exists Version-Release number of selected component (if applicable): 1.1.8 and engine-3.6.3 How reproducible: 100% Steps to Reproduce: 1. Have 3 hosts and 1 or 2 domains 2. start the engine see all is up 3. disconnect the engine from the network - the client code will go into a loop shortly after exceeding the heartbeat Expected results: stable the timeout parameter and fail after the exeact timeout value don't busy wait on the timeout check, use different strategy.
Thanks for froland keen eye to find the timeout issue
Busy wait should be slow down by [1] what is the timeout setting that you use? [1] https://github.com/oVirt/vdsm-jsonrpc-java/blob/master/client/src/main/java/org/ovirt/vdsm/jsonrpc/client/utils/retry/Retryable.java#L38
That is true for tasks that will throw exception. The 'wait for finish task' [1] will not throw an exception. My timeout setting is all default. [1] 'wait for finish task' https://github.com/oVirt/vdsm-jsonrpc-java/blob/master/client/src/main/java/org/ovirt/vdsm/jsonrpc/client/reactors/ReactorClient.java#L119
Usually wait connect takes less than a second. The issue here is with getting new timeout each time and this would fix never ending spinning wait. Without this bug current strategy seems to be legit. Thanks for finding it.
Created attachment 1229054 [details] high cpu utilization after blocking a host
Verified. vdsm-jsonrpc-java-1.2.10-1.el7ev.noarch (repository 4.0.6-7)