Bug 1281815

Summary: Expired host's cert is not detected, instead there's flood of 'VDSM dell-r210ii-13 command failed: General SSLEngine problem...'
Product: [oVirt] ovirt-engine Reporter: Jiri Belka <jbelka>
Component: Backend.CoreAssignee: Moti Asayag <masayag>
Status: CLOSED WONTFIX QA Contact: Aharon Canan <acanan>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.6.0.2CC: bugs, oourfali
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-16 09:02:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log none

Description Jiri Belka 2015-11-13 13:26:49 UTC
Description of problem:

BZ1257876 corrects behaviour of reporting going to expire/expired certs. But for expired host's certificate, there's no such message.

As traffic between engine and host goes over TLS, the communication is broken. But IMO we could report that the cert is expired on the host (one can detect is via `openssl s_client' ...

0050 - 2e 62 72 71 2e 72 65 64-68 61 74 2e 63 6f 6d 31   .brq.redhat.com1
0060 - 37 30 35 06 03 55 04 03-13 2e 31 30 2d 33 34 2d   705..U....10-34-
0070 - 36 30 2d 31 38 35 2e 72-68 65 76 2e 6c 61 62 2e   60-185.rhev.lab.
0080 - 65 6e 67 2e 62 72 71 2e-72 65 64 68 61 74 2e 63   eng.brq.redhat.c
0090 - 6f 6d 2e 34 31 35 37 33-30 1e 17 0d 31 35 31 31   om.415730...1511
00a0 - 31 31 31 38 30 30 31 30-5a 17 0d 31 35 31 31 32   11180010Z..15112
00b0 - 37 31 38 30 30 31 30 5a-30 5b 31 24 30 22 06 03   7180010Z0[1$0"..

..., (20)15-11-27 18:00:10... is enddate).

So IMO we should finish with a message about expired cert and/or messages about communicate issue and SSLEngine problem, but no flood events tab.

Dec 1, 2015 12:13:03 AM
VDSM dell-r210ii-13 command failed: General SSLEngine problem
	
Dec 1, 2015 12:12:45 AM
VDSM dell-r210ii-13 command failed: General SSLEngine problem
	
Dec 1, 2015 12:12:27 AM
VDSM dell-r210ii-13 command failed: General SSLEngine problem
	
Dec 1, 2015 12:12:09 AM
VDSM dell-r210ii-13 command failed: General SSLEngine problem
	
Dec 1, 2015 12:11:54 AM
VDSM dell-r210ii-13 command failed: Message timeout which can be caused by communication issues

(Not sure if message every 18seconds is caused by:

# engine-config -g CertificationValidityCheckTimeInHours
CertificationValidityCheckTimeInHours: 0.05 version: general

...)

Version-Release number of selected component (if applicable):
rhevm-backend-3.6.0.3-0.1.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. install 3.6 engine and 3.6 host
2. move time forward to hosts' engine is expired (engine too)
3. (on host: date ; openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -enddate -noout)

Actual results:
- no info about expired host cert
- it is not obvious why there's communication issue
- flood of General SSLEngine problem event msgs

Expected results:
- user understandable overview why ssl connection is broken
- event msg about expired host cert

Additional info:

Comment 1 Moti Asayag 2015-11-16 09:00:06 UTC
The certificate is being examined only when the host is 'up' or 'non-operational' and while the engine is capable to communicate with the host and to query its certificates.

The appeared message result in an attempt of the engine to connect to the host (as part of the host monitoring) which fails due to "General SSLEngine problem". The host suppose to be at that point in 'Non Responsive' state.

It is not reasonable that a host's certs will suddenly get expired. If the host is active in the system - at some point of the certs will be examined and reported, but when playing tricks with dates - that's is not reasonable case to support.

Comment 2 Oved Ourfali 2015-11-16 09:02:57 UTC
Jiri - thanks for bringing this issue into our knowledge.
However, I agree with Moti here that it isn't a reasonable use-case.
Closing as WONTFIX.