Description of problem: [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ ERROR ] Failed to execute stage 'Environment setup': VDSM did not start within 120 seconds [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160720154434.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue, fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160720154231-k5lk2g.log Something went wrong setting up hosted engine, or the setup process was cancelled. Version-Release number of selected component (if applicable): ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-node-selinux-3.6.1-13.0.el7ev.noarch mom-0.5.4-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 vdsm-4.17.31-0.el7ev.noarch ovirt-node-plugin-rhn-3.6.1-13.0.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-host-1.0.2-2.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 ovirt-node-lib-legacy-3.6.1-13.0.el7ev.noarch ovirt-node-lib-3.6.1-13.0.el7ev.noarch ovirt-node-3.6.1-13.0.el7ev.noarch ovirt-node-plugin-cim-logic-3.6.1-13.0.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch ovirt-node-plugin-hosted-engine-0.3.0-7.el7ev.noarch ovirt-node-plugin-vdsm-0.6.1-10.el7ev.noarch ovirt-node-plugin-snmp-3.6.1-13.0.el7ev.noarch ovirt-node-branding-rhev-3.6.1-13.0.el7ev.noarch ovirt-node-lib-config-3.6.1-13.0.el7ev.noarch ovirt-node-plugin-snmp-logic-3.6.1-13.0.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch ovirt-host-deploy-offline-1.4.0-1.el7ev.x86_64 ovirt-node-plugin-cim-3.6.1-13.0.el7ev.noarch rhevm-sdk-python-3.6.7.0-1.el7ev.noarch sanlock-3.2.4-1.el7.x86_64 Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160627.3.el7ev) How reproducible: Steps to Reproduce: 1.Install RHEVH 20160627.3.el7ev on host, using virtual media from rhev-hypervisor7-7.2-20160627.3.iso. 2.Start deployment of hosted-engine using appliance on iSCSI storage. 3. Actual results: Deployment failed. Expected results: Deployment should succeed. Additional info:
Created attachment 1182170 [details] sosreport from alma04
I see these in vdsm.log: Reactor thread::ERROR::2016-07-20 15:44:24,204::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv 3 alert bad certificate Reactor thread::INFO::2016-07-20 15:44:25,207::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127. 0.0.1:43532 Reactor thread::ERROR::2016-07-20 15:44:25,208::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv 3 alert bad certificate Reactor thread::INFO::2016-07-20 15:44:26,211::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127. 0.0.1:43534 Reactor thread::ERROR::2016-07-20 15:44:26,213::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv 3 alert bad certificate Reactor thread::INFO::2016-07-20 15:44:27,217::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127. 0.0.1:43536 Reactor thread::ERROR::2016-07-20 15:44:27,218::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv 3 alert bad certificate Reactor thread::INFO::2016-07-20 15:44:28,221::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127. 0.0.1:43538 Reactor thread::ERROR::2016-07-20 15:44:28,222::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv 3 alert bad certificate Reactor thread::INFO::2016-07-20 15:44:28,289::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127. 0.0.1:43540
(In reply to Nikolai Sednev from comment #3) > I see these in vdsm.log: > Reactor thread::ERROR::2016-07-20 > 15:44:24,204::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher:: > (handle_read) Error during handshake: sslv > 3 alert bad certificate And what do you in the certificate?
(In reply to Yaniv Kaul from comment #4) > (In reply to Nikolai Sednev from comment #3) > > I see these in vdsm.log: > > Reactor thread::ERROR::2016-07-20 > > 15:44:24,204::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher:: > > (handle_read) Error during handshake: sslv > > 3 alert bad certificate > > And what do you in the certificate? Nothing. Taken fresh and clean host after it was installed with RHEVH and tried to deploy hosted-engine as usual.
Right after logging in to host over ssh I saw on background, right before TUI got started these: Catching exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt/node/setup/vdsm/engine_page.py", line 87, in sync_mgmt networks = cli.getVdsCapabilities()['info']['networks'] File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request self.send_content(h, request_body) File "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content connection.endheaders(request_body) File "/usr/lib64/python2.7/httplib.py", line 975, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 835, in _send_output self.send(msg) File "/usr/lib64/python2.7/httplib.py", line 797, in send self.connect() File "/usr/lib/python2.7/site-packages/vdsm/m2cutils.py", line 206, in connect cert_reqs=self.cert_reqs) File "/usr/lib64/python2.7/ssl.py", line 913, in wrap_socket ciphers=ciphers) File "/usr/lib64/python2.7/ssl.py", line 588, in __init__ self.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 810, in do_handshake self._sslobj.do_handshake() SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:765)
(In reply to Nikolai Sednev from comment #6) > SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed > (_ssl.c:765) So there are two possible issues here. Either: - a certificate is invalid. or: - A certificate cannot be verified - for example, since the whole chain cannot. One possible reason - the CA certificate is missing. Or it may be invalid, etc. Lastly, can we ensure this host is not somehow registered / used by more than one Engine?
(In reply to Yaniv Kaul from comment #8) > (In reply to Nikolai Sednev from comment #6) > > > SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed > > (_ssl.c:765) > > So there are two possible issues here. > Either: > - a certificate is invalid. > or: > - A certificate cannot be verified - for example, since the whole chain > cannot. One possible reason - the CA certificate is missing. Or it may be > invalid, etc. > > Lastly, can we ensure this host is not somehow registered / used by more > than one Engine? It is a clean vanilla host, not have been registered to any engine before. 1)Install it from ISO using virtual media. 2)Try to deploy hosted-engine. 3)Get failure.
Seems the bug should dup with bug 1332796. I assume the previous OS is RHVH, am I right? RHEV-H QE can't reproduce this issue on a clean env, deploy hosted engine can successful.
Nikolai, could you please confirm if this deploy was used in a non clean deploy? I cannot see it so far.
(In reply to Douglas Schilling Landgraf from comment #16) > Nikolai, could you please confirm if this deploy was used in a non clean > deploy? I cannot see it so far. Deployment was made on cleanest environment as it possibly could be, please se also https://bugzilla.redhat.com/show_bug.cgi?id=1358418#c5. Clean means that I've previously reprovisioned with clean RHEL7.2 using PXE, after that I've used RHEVH ISO image and installed it using virtual media. Installation of OS was successful. I have this issue reproduced on both my hosts.
moving to ovirt-node-plugin-hosted-engine, since from the traceback, it looks like the affected component. File "/usr/lib/python2.7/site-packages/ovirt/node/setup/vdsm/engine_page.py", line 87, in sync_mgmt networks = cli.getVdsCapabilities()['info']['networks'] Douglas, maybe a race condition? Looks like vdsm started before certificates were available. Not sure about the flow of legacy RHEV-H. Moving to node, integration can assist if needed.
Dropping blocker, because it just affects new installations.
This does not look like a Node bug - The tb which includes node in comment 6 is probably just another symptom of the root cause which seems to be in vdsm (see comment 3).
According to Nikolai this could be related to time being out of sync. We have seen this bug in the past.
It might be similar to BZ #1291161 ? (In reply to Sandro Bonazzola from comment #18) > moving to ovirt-node-plugin-hosted-engine, since from the traceback, it > looks like the affected component. > > File > "/usr/lib/python2.7/site-packages/ovirt/node/setup/vdsm/engine_page.py", > line 87, in sync_mgmt > networks = cli.getVdsCapabilities()['info']['networks'] > > Douglas, maybe a race condition? Looks like vdsm started before certificates > were available. Not sure about the flow of legacy RHEV-H. > Moving to node, integration can assist if needed. It smells just like BZ #1291161 which is WONTFIX. Can we reconsider fixing it, if we keep seeing this issue once in a while ?
Yes, that's why we kept the bug open. We need to fully understand why this time problem exists, then we can look for a solution.
(In reply to Fabian Deutsch from comment #24) > According to Nikolai this could be related to time being out of sync. > > We have seen this bug in the past. Nikolai, are you able to reproduce this bug even with hosts which has time properly synced?
(In reply to Martin Perina from comment #27) > (In reply to Fabian Deutsch from comment #24) > > According to Nikolai this could be related to time being out of sync. > > > > We have seen this bug in the past. > > Nikolai, are you able to reproduce this bug even with hosts which has time > properly synced? Not related to time, and yes. I've been able to reproduce.
Hi Nikolai, I couldn't reproduce your report yet and the vdsm certs in the sosreport seems ok. Are you able to reproduce this report with different versions of rhevm-appliance? For example, could you please try against rhevm-appliance-20160721.0-2? From the logs, I saw you used: rhevm-appliance-20160620.0-1.x86_64.rhevm.ova
15:44:24,204::m2cutils::337::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv3 alert bad certificate Another question, could be protocol wrong here? Should it be tlsv1 for all 3.6? # grep ssl_protocol /usr/lib/python2.7/site-packages/vdsm/config.py ('ssl_protocol', 'tlsv1',
I meant, it should be tlsv1 for all 3.6, not sure why this sslv3 in the logs.
We use tlsv1 in 3.6. It could be generic message that was reused. You can check this by enabling ssl debug for the engine or vdsm.
Nikolai, can you reproduce the bug with ssl debug enabled?
(In reply to Fabian Deutsch from comment #33) > Nikolai, can you reproduce the bug with ssl debug enabled? A bit problematic now, have no more resources for reproduction.
Hi Nikolai, I will keep the needinfo, when you get the resource back please provide the debug information and if it works out of box in recent RHEV-M appliance builds. Thanks!
Moving this bug to douglas who's handling it.
Moving to medium, waiting the feedback.
Closing this bug for now, please provide the necessary data and reopen the bug.
Worked for me after depoyed fresh rhevh and waited over night, then deployed hosted-engine over iSCSI on it and everything worked. I used rhev-hypervisor7-7.2-20160711.0 and rhevm-appliance-20160615.0-2.x86_64.rhevm.ova. Got installed Red Hat Enterprise Virtualization Manager Version: 3.6.7.4-0.1.el6. On host these packages were used: ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-node-selinux-3.6.1-13.0.el7ev.noarch mom-0.5.5-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 vdsm-4.17.33-1.el7ev.noarch ovirt-node-plugin-rhn-3.6.1-13.0.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-host-1.0.2-2.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-node-lib-legacy-3.6.1-13.0.el7ev.noarch ovirt-node-lib-3.6.1-13.0.el7ev.noarch ovirt-node-3.6.1-13.0.el7ev.noarch ovirt-node-plugin-cim-logic-3.6.1-13.0.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch ovirt-node-plugin-hosted-engine-0.3.0-7.el7ev.noarch ovirt-node-plugin-vdsm-0.6.1-10.el7ev.noarch ovirt-node-plugin-snmp-3.6.1-13.0.el7ev.noarch ovirt-node-branding-rhev-3.6.1-13.0.el7ev.noarch ovirt-node-lib-config-3.6.1-13.0.el7ev.noarch ovirt-node-plugin-snmp-logic-3.6.1-13.0.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch ovirt-host-deploy-offline-1.4.0-1.el7ev.x86_64 ovirt-node-plugin-cim-3.6.1-13.0.el7ev.noarch rhevm-sdk-python-3.6.8.0-1.el7ev.noarch sanlock-3.2.4-1.el7.x86_64 Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160711.0.el7ev)