| Summary: | [RHEVM] 3.3 host installation into 3.0 RHEVM fails | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Martin Pavlik <mpavlik> | ||||||||||||||||
| Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | |||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||
| Priority: | urgent | ||||||||||||||||||
| Version: | 3.3.0 | CC: | aberezin, acathrow, bazulay, danken, dougsland, gklein, hateya, herrold, iheim, lpeer, mpavlik, Rhev-m-bugs, tdosek, yeylon | ||||||||||||||||
| Target Milestone: | --- | Keywords: | Regression | ||||||||||||||||
| Target Release: | 3.3.1 | ||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||
| OS: | Linux | ||||||||||||||||||
| Whiteboard: | infra | ||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2014-01-01 16:33:14 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||||
however if you go to host and do wget http://mp-rhevm30.rhev.lab.eng.brq.redhat.com:8080/rhevm.ssh.key.txt it works Created attachment 834843 [details]
var logs from engine
Created attachment 834844 [details]
tmp logs from node
/etc/vdsm/vdsm.conf does not contain any more:
[vars]
trust_store_path = ...
And should.
---
Tue, 10 Dec 2013 16:54:09 DEBUG <BSTRAP component='VerifyServices' status='OK' message='Needed services set'/>
Tue, 10 Dec 2013 16:54:09 ERROR Traceback (most recent call last):
File "/tmp/vds_bootstrap_0e45d167-8eb1-4142-a79d-679af7ef7364.py", line 897, in main
orgName, systime, usevdcrepo, firewallRulesFile)
File "/tmp/vds_bootstrap_0e45d167-8eb1-4142-a79d-679af7ef7364.py", line 851, in VdsValidation
oDeploy.setCertificates(subject, random_num, orgName)
File "/tmp/vds_bootstrap_0e45d167-8eb1-4142-a79d-679af7ef7364.py", line 784, in setCertificates
deployUtil.createCSR(orgName, subject, random_num, tsDir, vdsmKey, dhKey)
File "/tmp/deployUtil.py", line 1158, in createCSR
os.mkdir(tsDir + "/keys")
OSError: [Errno 2] No such file or directory: '/var/vdsm/ts/keys'
Dan, I'm trying to find in log history where set of default trust_store_path disappeared, you might can help me.. was a reason for that? Hey Martin, As far as I see in the code, since RHEV-3.2 nothing was changed in the area of creating the vdsm.conf and the usage of it in vds_bootstrap.py. Can you check if the bug occurs also with RHEV3.2 builds? It will help to know since when the regression started and how relevant this issue is Thanks. Hi Yaniv, RHEVM 3.2.4-0.44.el6ev (sf22.1) - works RHEVM 3.1.0-55.el6ev - works RHEVM 3.0.8_0001-1.el6_3 - does not work Martin, I meant using older vdsm, use vdsm-3.2 and check if it deploy works with RHEVM3.0 works with rhel 6.5 which has vdsm 4.10.28.0 logs from node and engine attached Created attachment 836266 [details]
32_host_tmp_logs
Created attachment 836267 [details]
32_host_var_logs
Created attachment 836268 [details]
32_engine_logs
(In reply to Martin Pavlik from comment #10) > works with rhel 6.5 which has vdsm 4.10.28.0 > > logs from node and engine attached v4.10.2-28.0 is the latested rhev-3.2 build of vdsm, i.e., this is a recent regression of rhev-3.3 and should be addressed. Installing VDSM v.10.2-28 still doesn't provide vdsm.conf at all and more specifically trust_store_path is not set to any specific location. Additionally bootstrap remains to read trust_store_path for tsDir as it was and was not quite changed for a long time, so how can it be different? So iiuc, and I think I do, we never set trust_store_path in vdsm.conf and instead we used '/etc/pki/vdsm' by default over rhel >= 6.0 which works. The only way this bug can occur is when using '/var/vdsm/ts' as seems to happened in this bug, and this can happen only if the rhel version is older than 6.0, which not what happened as you used ic159 3.0.8_0001-1.el6_3.
Now, I saw in your first attached tars that you also provide vds_bootstrap.py code which is different in line 774:
try:
tsDir = config.get('vars', 'trust_store_path')
except:
tsDir = '/var/vdsm/ts'
and should be (since vdsm-v.10.2-28 till master):
try:
tsDir = config.get('vars', 'trust_store_path')
except:
if rhel6based:
tsDir = '/etc/pki/vdsm'
else:
tsDir = '/var/vdsm/ts'
this change was first introduced in http://gerrit.ovirt.org/#/c/767/ which added since v4.9.2. So maybe you were wrong about the vdsm version that you installed? As far as I understand, you used vdsm-4.9
can you verify that I'm wrong asap?
With a bit more deeper look I understand that bootstrap.py is provided during the installation of rhevm in rhevm side, which is not related at all to the installed vdsm version on host. Which means that rhevm3.0 provides a bootstrap.py version that requires to have vdsm.conf if it runs over host with rhel >= 6.0, which means that the bug was there always. This is not a recent regression and it was exist for all vdsm versions as we never provided vdsm.conf with trust_store_path set to /etc/pki/vdsm. We have two option to handle this issue: 1. vdsm-bootstrap package will be taken from recent vdsm version on host, which is a change in RHEVM-3.0 installation. 2. add installation of vdsm.conf file for vdsm over rhel that we didn't provide before, which will solve the issue only for ovirt-3.3 and above. As it is not a regression, please consider the options and let me know what is preferred If indeed adding a fresh rhev-3.2 node to a rhev-3.0 setup is already broken, and no one complained, we can keep the situation as it is. A release note, that a user should replace his engine-side vds_bootstrap.py with a fresh version would be enough imo. After Martin reran it, we saw the exact error in getRemoteFile stage of bootstrap:
on vdsm-3.2 we get:
Mon, 16 Dec 2013 09:51:22 DEBUG getRemoteFile start. IP = mp-rhevm30.rhev.lab.eng.brq.redhat.com port = 8080 fileName = "/rhevm.ssh.key.txt"
Mon, 16 Dec 2013 09:51:22 DEBUG /rhevm.ssh.key.txt failed in HTTPS. Retrying using HTTP.
Traceback (most recent call last):
File "/tmp/deployUtil.py", line 1288, in getRemoteFile
conn.sock = getSSLSocket(sock, certPath)
File "/tmp/deployUtil.py", line 1132, in getSSLSocket
cert_reqs=ssl.CERT_REQUIRED)
File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
suppress_ragged_eofs=suppress_ragged_eofs)
File "/usr/lib64/python2.6/ssl.py", line 118, in __init__
cert_reqs, ssl_version, ca_certs)
SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib
Mon, 16 Dec 2013 09:51:22 DEBUG getRemoteFile end.
Mon, 16 Dec 2013 09:51:22 DEBUG handleSSHKey start
Mon, 16 Dec 2013 09:51:22 DEBUG handleSSHKey: creating .ssh dir.
...
first failed with https, but fallback to http works and continues till the end.
Although, with vdsm-3.3 we see:
Mon, 16 Dec 2013 10:28:22 DEBUG getRemoteFile start. IP = mp-rhevm30.rhev.lab.eng.brq.redhat.com port = 8080 fileName = "/rhevm.ssh.key.txt"
Mon, 16 Dec 2013 10:28:22 DEBUG /rhevm.ssh.key.txt failed in HTTPS. Retrying using HTTP.
Traceback (most recent call last):
File "/tmp/deployUtil.py", line 1286, in getRemoteFile
sock.connect((IP, nPort))
File "<string>", line 1, in connect
gaierror: [Errno -3] Temporary failure in name resolution
Mon, 16 Dec 2013 10:28:22 ERROR Failed to fetch /rhevm.ssh.key.txt using http.
Traceback (most recent call last):
File "/tmp/deployUtil.py", line 1302, in getRemoteFile
conn.request("GET", fileName)
File "/usr/lib64/python2.6/httplib.py", line 914, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.6/httplib.py", line 951, in _send_request
self.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/httplib.py", line 739, in send
self.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
Mon, 16 Dec 2013 10:28:22 ERROR Failed to fetch /rhevm.ssh.key.txt status
Mon, 16 Dec 2013 10:28:22 DEBUG getRemoteFile end.
Mon, 16 Dec 2013 10:28:22 DEBUG <BSTRAP component='SetSSHAccess' status='FAIL' message='Failed to retrieve server SSH key.'/>
Mon, 16 Dec 2013 10:28:22 ERROR setSSHAccess test failed
Mon, 16 Dec 2013 10:28:22 DEBUG <BSTRAP component='RHEV_INSTALL' status='FAIL'/>
Mon, 16 Dec 2013 10:28:22 DEBUG **** End VDS Validation ****
The bootstrap failed to continue with an error about the hostname in both tries. Any recent changes in this area?
Created attachment 837622 [details]
Running same bootstrap twice
In the attachment I share the logs of 2 same runs of the bootstrap code, first run fails on gaierror, second works properly
The issue seems to be tied to my local DNS server, when host 3.3 had rhevm ip/hostname in /etc/hosts before adding into 3.0 RHEVM it got installed. Also changing DNS to another one helped. However host with same configuration can be added to 3.3, 3.2 and 3.1 RHEVM. Will do some digging on local DNS, see if I can find some issue. If host does not obtain its settings from DHCP, it works with my local DNS as well After several more tries it seems that only combination that does not work is RHEVM 3.0 with 3.3 host which has IP from DHCP including obtaining DNS server from DHCP. Looking to get different environment to confirm that the issue is problem with my local env. We still suspect Martin's environment, although we try to reproduce it on more RHEVM-3.0 envs as well. As a workaround, host-3.3 can be added to rhevm-3.0 if we configure its network configurations statically . still didn't find the differences that are related to that area between host-3.2 to host-3.3 Reinstall after first fail installation also set the host to UP, which is much easier to maintain then configure static addresses. Other than that, it seems that after creating rhevm bridge interface we have an issue to establish the network to the rhevm itself for few seconds. This means that with the right delay it does work properly. It still doesn't explain why it does work with the same rate with host-3.2 You're right Tomas, thank. I verified it. you are not allowed to had older vdsm since 4.9, which mean ovirt-3.1 I really don't understand how Martin did it.. after verifying - host without repos of vdsm-4.9* cannot be added to RHEVM-3.0 closing as NOTABUG |
Created attachment 834842 [details] var_log Description of problem: 3.3 host installation into 3.0 RHEVM fails because rhevm.ssh.key.txt cannot be downloaded Version-Release number of selected component (if applicable): ic159 - Red Hat Enterprise Virtualization Manager Version: 3.0.8_0001-1.el6_3 How reproducible: 100% Steps to Reproduce: 1. install host with 3.3 vdsm (used version vdsm-4.13.2-0.1.rc.el6ev.x86_64) 2. have 3.0 rhevm (ic159) 3. try to add the host to rhevm Actual results: install failed because rhevm.ssh.key.txt cannot be downloaded Expected results: successful install Additional info: Tue, 10 Dec 2013 16:54:12 DEBUG /rhevm.ssh.key.txt failed in HTTPS. Retrying using HTTP. Traceback (most recent call last): File "/tmp/deployUtil.py", line 1288, in getRemoteFile conn.sock = getSSLSocket(sock, certPath) File "/tmp/deployUtil.py", line 1132, in getSSLSocket cert_reqs=ssl.CERT_REQUIRED) File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket suppress_ragged_eofs=suppress_ragged_eofs) File "/usr/lib64/python2.6/ssl.py", line 120, in __init__ self.do_handshake() File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 8] _ssl.c:492: EOF occurred in violation of protocol