Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1141850

Summary: Unhandled SSLError in MultiProtocolAcceptor
Product: [Retired] oVirt Reporter: Nir Soffer <nsoffer>
Component: vdsmAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Gil Klein <gklein>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.5CC: bazulay, bugs, danken, ecohen, gklein, iheim, mgoldboi, nsoffer, pkliczew, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-21 16:13:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm log
none
engine log none

Description Nir Soffer 2014-09-15 15:20:39 UTC
Description of problem:

This happens sometimes:

Detector thread::ERROR::2014-09-15 18:11:03,634::protocoldetector::98::vds.MultiProtocolAcceptor::(serve_forever) Unhandled exception
Traceback (most recent call last):
  File "/usr/share/vdsm/protocoldetector.py", line 94, in serve_forever
    self._process_events()
  File "/usr/share/vdsm/protocoldetector.py", line 111, in _process_events
    self._accept_connection()
  File "/usr/share/vdsm/protocoldetector.py", line 174, in _accept_connection
    client_socket, _ = self._socket.accept()
  File "/usr/lib64/python2.7/site-packages/vdsm/sslutils.py", line 121, in accept
    raise SSL.SSLError("%s, client %s" % (e, address[0]))
SSLError: unexpected eof, client 10.35.0.141

When this happens when engine try to connect to host, host becomes unassigned, and there is no way to recover after few minutes.

Version-Release number of selected component (if applicable):
master from Sep 15

How reproducible:
Random

Steps to Reproduce:
1. Add host using xmlrpc
2. Put host to maintenance
3. Install and configure new vdsm from master
4. Start vdsm
5. Activate hosts in engine

Actual results:
Host become unassigned and stay in this state for minutes 

Expected results:
SSLError handled by closing the connection in a clean way, helping engine to detect the failure and reconnect.

Additional info:
Would be nice why we get this SSLError in the first place - maybe there is some bug in the engine side.

Comment 1 Nir Soffer 2014-09-15 15:22:02 UTC
Created attachment 937651 [details]
vdsm log

Comment 2 Nir Soffer 2014-09-15 15:24:55 UTC
Created attachment 937653 [details]
engine log

Comment 3 Nir Soffer 2014-09-15 15:25:41 UTC
I assume that Piotr would like to check this.

Comment 4 Piotr Kliczewski 2014-09-16 09:46:14 UTC
Please provide information about which version of vdsm-jsonrpc-java are you using.

It looks like issue already fixed for bug https://bugzilla.redhat.com/1136876

Comment 5 Nir Soffer 2014-09-16 11:16:15 UTC
(In reply to Piotr Kliczewski from comment #4)
> Please provide information about which version of vdsm-jsonrpc-java are you
> using.
vdsm-jsonrpc-java-1.0.7-0.0.master.20140910072540.git92be015.fc20.noarch

Comment 6 Nir Soffer 2014-09-16 11:20:31 UTC
But it does not matter if you fixed an issue in the java side, the SSLError when accepting should be handled and we should not log an exception. This is expected error condition, and it should log only a warning. A client connecting and disconnecting in an unclean way is not a vdsm error.

Comment 7 Piotr Kliczewski 2014-09-16 11:35:45 UTC
There is good number of situations when sslerror can be thrown. Based on the error information that we have from m2crypto we are unable to tell whether it was a cert issue (vdsm) or client disconnected. The only thing that we can tell here is that handshake failed and depending on a reason (unknown) we can say whether it is error or warning.

Comment 8 Sandro Bonazzola 2015-01-21 16:13:02 UTC
oVirt 3.5.1 has been released and since this bug is targeted 3.5.1 and in modified state, it should be included in this release.
Please re-target and move nack to modified if this assumption is not valid for this bug.

Comment 9 Piotr Kliczewski 2015-01-22 07:39:13 UTC
The assumption is valid.