Bug 1787195
Summary: | [4.4.0-13] 1 out of 3 hosts non-responsive after ansible install finished - ovn-controller[20628]: ovs|00040|stream_ssl|ERR|Private key does not match certificate public key: error:140A80BE:SSL routines:SSL_CTX_check_private_key:no private key assigned | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Avihai <aefrat> | ||||||||
Component: | ovirt-host-deploy-ansible | Assignee: | Dana <delfassy> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukas Svaty <lsvaty> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 4.4.0 | CC: | bpelled, bugs, delfassy, dfodor, lleistne, michal.skrivanek, mperina | ||||||||
Target Milestone: | ovirt-4.4.0 | Flags: | pm-rhel:
ovirt-4.4+
mperina: blocker? |
||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | ansible-runner-service-1.0.1-3 | Doc Type: | No Doc Update | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2020-05-20 20:03:53 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Avihai
2020-01-01 09:45:57 UTC
Created attachment 1649029 [details]
2nd ENV with the same issue engine and host journalctl logs
Added 2nd reproduction of the same issue engine and host journalctl logs
Avihai, why do you think that vdsm is not responsive due to ovn? I would guess that both are failing due to incomplete key setup. I would guess this is a deployment bug, not network-specific. I do not see vdsm.log in your attached log, it should show failed connections from Engine. Does vdsm-cli work locally? On another matter: this bug should not be assigned to me. Was it set so by default? For some reason you assigned as default. Network team didn't saw this issue. But i did saw this issue on Avihai's env.. (In reply to Dan Kenigsberg from comment #2) > Avihai, why do you think that vdsm is not responsive due to ovn? I would > guess that both are failing due to incomplete key setup. I would guess this > is a deployment bug, not network-specific. Indeed I'm not sure but this is the only issue/error but I see in journalctl log in the host that ovn-controller has this ssh key issue and in the service logs I see the same issue thus I assume it's the root cause here. Journal log excerpt: Dec 22 11:32:00 storage-ge8-vdsm2.scl.lab.tlv.redhat.com ovn-controller[24448]: ovs|00075|stream_ssl|ERR|Private key does not match certificate public key: error:140A80BE:SSL routines:SSL_CTX_check_private_key:no private key assigned Also when I restarted the restart ovn-controller.service and reinstalled the host the issue was resolved. But sure I'm not the expert here, you or Dominic are :) > I do not see vdsm.log in your attached log, it should show failed > connections from Engine. I'll add it as well. This is what you see there: 2019-12-23 11:46:32,357-0500 ERROR (Reactor thread) [vds.dispatcher] uncaptured python exception, closing channel <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:10.35.162.6', 36176, 0, 0) at 0x7effe861 39e8> (<class 'ssl.SSLError'>:[X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:3563) [/usr/lib64/python3.6/asyncore.py|readwrite|110] [/usr/lib64/python3.6/asyncore.py|handle_write_event|442] [/usr/lib/p ython3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74] [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168] [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_wr ite|190] [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194] [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|156]) (betterAsyncore:179) > Does vdsm-cli work locally? do you mean 'vdsm-client' ? if so than yes. > On another matter: this bug should not be assigned to me. Was it set so by > default? Answered by Michael B. Created attachment 1649076 [details]
vdsm log with the issue
Other inputs that may be useful about the workaround/bug: 1) I also tried to reinstall the host with the correct user/password (not ssh key) just to see if ssh key was the issue but I still saw the same issue occurs. This was done WITHOUT enroll certificate + restart ovn-controller.service. 2)enroll certificate play a key part in workaround this issue: Before I manually do via webadmin-> host-> installation -> enroll certificate I see the SSL errors in the ovn-controller service[1]. After enroll certificate + restart ovn-controller.service the errors do not exist anymore. Maybe this step "enroll certificate" somehow gets missed on one out of 3 hosts when we do multiple add hosts via ansible? [1] BEFORE: [root@rose11 ~]# systemctl status ovn-controller.service ● ovn-controller.service - OVN controller daemon Loaded: loaded (/usr/lib/systemd/system/ovn-controller.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-01-02 04:29:41 EST; 1min 46s ago Process: 23900 ExecStop=/usr/share/openvswitch/scripts/ovn-ctl stop_controller (code=exited, status=0/SUCCESS) Process: 23917 ExecStart=/usr/share/openvswitch/scripts/ovn-ctl --no-monitor start_controller $OVN_CONTROLLER_OPTS (code=exited, status=0/SUCCESS) Main PID: 23951 (ovn-controller) Tasks: 4 (limit: 26213) Memory: 5.9M CGroup: /system.slice/ovn-controller.service └─23951 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/certs/vdsmcert.pem --ca-cert> Jan 02 04:30:12 rose11.scl.lab.tlv.redhat.com ovn-controller[23951]: ovs|00023|stream_ssl|ERR|Private key does not match certificate public key: error:140A80BE:SSL routines:SSL_CTX_check_private_key:no private> Jan 02 04:30:20 rose11.scl.lab.tlv.redhat.com ovn-controller[23951]: ovs|00024|stream_ssl|ERR|Private key does not match certificate public key: error:140A80BE:SSL routines:SSL_CTX_check_private_key:no private> Jan 02 04:30:28 rose11.scl.lab.tlv.redhat.com ovn-controller[23951]: ovs|00025|stream_ssl|ERR|Private key does not match certificate public key: error:140A80BE:SSL routines:SSL_CTX_check_private_key:no private> [2] AFTER Jan 02 04:31:30 rose11.scl.lab.tlv.redhat.com systemd[1]: Started OVN controller daemon. [root@rose11 ~]# systemctl status ovn-controller.service ● ovn-controller.service - OVN controller daemon Loaded: loaded (/usr/lib/systemd/system/ovn-controller.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-01-02 04:31:30 EST; 3min 58s ago Process: 26914 ExecStop=/usr/share/openvswitch/scripts/ovn-ctl stop_controller (code=exited, status=0/SUCCESS) Process: 26931 ExecStart=/usr/share/openvswitch/scripts/ovn-ctl --no-monitor start_controller $OVN_CONTROLLER_OPTS (code=exited, status=0/SUCCESS) Main PID: 26963 (ovn-controller) Tasks: 4 (limit: 26213) Memory: 6.4M CGroup: /system.slice/ovn-controller.service └─26963 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/pki/vdsm/keys/vdsmkey.pem --certificate=/etc/pki/vdsm/certs/vdsmcert.pem --ca-cert> Jan 02 04:31:30 rose11.scl.lab.tlv.redhat.com systemd[1]: Starting OVN controller daemon... Jan 02 04:31:30 rose11.scl.lab.tlv.redhat.com ovn-ctl[26931]: Starting ovn-controller [ OK ] Jan 02 04:31:30 rose11.scl.lab.tlv.redhat.com systemd[1]: Started OVN controller daemon. Also I notice that the workaround mentioned usually worked but for (in 4 ENV's) but for some ENV's only removing the host and re-adding it after the workaround procedure was finished did the trick. Moving to infra, I was able to reproduce certificate issue, when 2 hosts are deployed at the same time, but it's some race, because it happened only during 2nd attempt, 1st attempt was successful however this seems to be a concurrency issue. it seems to work ok when you install one by one. INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'ovirt-engine-4.4.0' doesn't contain patch 'https://gerrit.ovirt.org/106357'] gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.4.0 For more info please contact: infra We don't see it in GE build anymore. Checked run with ovirt-engine-4.4.0-0.25.master.el8ev.noarch This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |