Bug 856167
Summary: | 3.1 - [RHEV-H 6.3]Auto install RHEV-H with "management_server=$RHEV-M_IP" parameter, it failed to approve rhevh on rhevm side. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | haiyang,dong <hadong> | ||||||
Component: | vdsm | Assignee: | Juan Hernández <juan.hernandez> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Pavel Stehlik <pstehlik> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.3 | CC: | abaron, acathrow, alonbl, bazulay, bsarathy, chchen, chetan, cpelland, cshao, dyasny, gouyang, hadong, iheim, ilvovsky, istein, jboggs, leiwang, lpeer, mburns, ovirt-maint, Rhev-m-bugs, ycui, yeylon, ykaul, yzaslavs, zdover | ||||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||||
Target Release: | 6.3 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | infra | ||||||||
Fixed In Version: | vdsm-4.9.6-40.0 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, performing an automated installation of a hypervisor using the "management_server" parameter without specifying a port number and without the "management_server_fingerprint" option succeeded, but the hypervisor could not be approved from the Manager administration portal. Now, port 443 is used by default if an alternate port is not provided, and management_server_fingerprint is optional. You can automatically install and approve a Hypervisor without specifying a port number or a management_server_fingerprint.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 861399 (view as bug list) | Environment: | |||||||
Last Closed: | 2012-12-04 19:11:37 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 861399, 863292 | ||||||||
Attachments: |
|
Description
haiyang,dong
2012-09-11 11:16:08 UTC
Created attachment 611737 [details]
attached vdsm-reg.log and engine.log
FWIW, this appears to happen every time you autoinstall with management server. Tried: management_server=<hostname>:80 management_server=<hostname>:443 management_server=<hostname>:443 management_server_fingerprint=<fingerprint> Workaround: After installation, login as admin, navigate to the RHEVM configuration screen, choose apply, then approve in RHEVM Web UI. Created attachment 611745 [details]
authorized_keys
The problem is stems from the authorized_keys file for root. I don't know exactly how it's pulled down yet, but it's pulling html code instead of an ssh key.
Please see attached
I'm guessing no redirect in servlet for this file. juan? My hypothesis is that the problem is that Apache is smarter than JBoss in this particular case: it is returning a nice HTTP error message when the host tries to connect to an HTTPS port using HTTP. When connecting directly to JBoss (without Apache as proxy) JBoss will just start the SSL handshake, the operation will fail in the host side and then it will try again with HTTPS. I need to verify this hypothesis and find a solution. Can you provide me with the exact kernel options used for the autoinstallation? A complete pxelinux.cfg/default entry would be great. I don't have a pxe profile handy at the moment, but this will work just the same. boot from CD/USB. on boot menu, highlight the install (or resinstall) option and press <TAB>. add the following: storage_init=/dev/sda BOOTIF=eth0 adminpw=XXXXXXX management_server=hostname:port for adminpw, generate the password using openssl passwd I don't have a solution for the problem, but this i what I found out: 1. The "management_server" parameter has to contain the IP address and the port number, both are mandatory. 2. The "management_server_fingerprint" is also mandatory. 3. When providing these two parameters correctly the CA certificate is downloaded from the engine correctly, and saved to /etc/pki/vdsm/certs/cacert.pem. The original CA certificate is saved and persisted to a backup file in the same directory: cp cacert.pem bkp-date-cacert.pem Note that this is the CA certificate, not the VDSM certificate, that one should be pushed by the manage later, after the reboot. 4. The hypervisor reboots automatically. 5. During the boot the vdsmd service it started, and as part of it start script, it validates the VDSM certificate against the CA certificate. This validation fails because the CA certificate is new (downloaded from the engine) but the VDSM certificate is still the original one. When the validation fails the vdsmd start script tries to replace the CA certificate with one in the /etc/pki/vdsm/certs directory. It will find the backup that we did before and use it. So VDSM is now running with its original CA and certificate. 6. The vdsm-reg service starts after that, and it uses the /etc/pki/vdsm/certs/cacert.pem CA certificate for the SSL communication with the engine. This fails because that CA certificate is not the one downloaded from the engine. 7. The vdsm-reg service tries now to register using HTTP instead of HTTPS and it succeeds, so the host is registered in the engine, but not approved. 8. The vdsm-reg service tries to download the SSH public key using SSL and the wrong CA certificate. This fails and vdsm-reg detects it correctly. 9. The vdsm-reg service tries to download the SSH public key using HTTP but using the HTTPS port. It gets the error page from Apache and saves it to the authorized_keys file because it is not validated. I think there are several things that could potentially need to be fixed here: A. Change the node so that it interprets correctly the "management_server" parameter even when the port number is not provided. B. Rethink this VDSM logic that selects one certificate from the /etc/pki/vdsm/certs directory. It is not as easy as removing it, as otherwise VDSM and libvirt (both use the same certificates) will fail to start after the reboot and then vdsm-reg won't be able to create the bridge (it uses libvirt for that). C. Rethink why the vdsm-reg doesn't download the CA certificate, it used to do it in older versions. D. Fix deployUtil.py so that when it downloads certificates or SSH keys it validates them before writing to files. I am not 100% sure this analysis is correct. Suggestions are welcome. Whatever the problem is and whatever the solution we implement I think it is not bad to validate the SSH public key before saving it to authorized_keys: http://gerrit.ovirt.org/8018 I will prepare another patch to verify the CA certificate as well. The following change adds the verification of the downloaded CA certificate: http://gerrit.ovirt.org/8021 In point A of comment 10 I said that the node needs to parse correctly the "managemen_server" kernel parameter, but I see now that this is done in "vdsm-config" which is part of VDSM. The following patch tries to fix that: http://gerrit.ovirt.org/8022 The solution I find for this problem (in addition to the previous three patches) is to download the engine CA certificate to a different file /etc/pki/vdsm/certs/enginecacert.pem. This file isn't touched by the VDSM start script, so it is preserved after the reboot and can be used by vdsm-reg to download the SSH key using HTTPS. The proposed change is here: http://gerrit.ovirt.org/8038 Test version: rhev-hypervisor6-6.3-20121101.0.el6_3.noarch.rpm vdsm-4.9.6-40.0.el6_3.x86_64 rhevm-3.1.0-23.el6ev According to Expected results: 1. Auto install RHEV-H with "management_server=$RHEV-M_IP:[portid]" parameter successfully on rhevh side should success. 2.Also see rhev-h register on RHEVM Web UI, and approve in RHEVM Web UI to make it up success. Test result: Tried: management_server=$RHEV-M_IP Pass management_server=$RHEV-M_IP:443 Pass management_server=$RHEV-M_IP:443 management_server_fingerprint=<fingerprint> Pass management_server=$RHEV-M_IP management_server_fingerprint=<fingerprint> Pass so this bug has been fixed In order to verified this bug, could you help change the status into "ON_QA"? According to the test result of comment 44, change the status into "VERIFIED" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |