Bug 856167 - 3.1 - [RHEV-H 6.3]Auto install RHEV-H with "management_server=$RHEV-M_IP" parameter, it failed to approve rhevh on rhevm side.
3.1 - [RHEV-H 6.3]Auto install RHEV-H with "management_server=$RHEV-M_IP" par...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
Unspecified Unspecified
urgent Severity urgent
: rc
: 6.3
Assigned To: Juan Hernández
Pavel Stehlik
infra
: Regression, ZStream
Depends On:
Blocks: 861399 vdsm_reg_ca_cert
  Show dependency treegraph
 
Reported: 2012-09-11 07:16 EDT by haiyang,dong
Modified: 2016-04-26 10:24 EDT (History)
26 users (show)

See Also:
Fixed In Version: vdsm-4.9.6-40.0
Doc Type: Bug Fix
Doc Text:
Previously, performing an automated installation of a hypervisor using the "management_server" parameter without specifying a port number and without the "management_server_fingerprint" option succeeded, but the hypervisor could not be approved from the Manager administration portal. Now, port 443 is used by default if an alternate port is not provided, and management_server_fingerprint is optional. You can automatically install and approve a Hypervisor without specifying a port number or a management_server_fingerprint.
Story Points: ---
Clone Of:
: 861399 (view as bug list)
Environment:
Last Closed: 2012-12-04 14:11:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
attached vdsm-reg.log and engine.log (16.52 KB, application/x-compressed-tar)
2012-09-11 07:43 EDT, haiyang,dong
no flags Details
authorized_keys (1.06 KB, text/plain)
2012-09-11 08:24 EDT, Mike Burns
no flags Details

  None (edit)
Description haiyang,dong 2012-09-11 07:16:08 EDT
Description of problem:
After Auto install RHEV-H with "management_server=$RHEV-M_IP" parameter, rhevm automatic register to rhevm3.1 successfully,but it failed to approve rhevh to up.

Version-Release number of selected component (if applicable):
rhevm-3.1.0-14.el6ev.noarch  
rhev-hypervisor6-6.3-20120910.0.rhev31.el6_3

How reproducible:
100%
 
Steps to Reproduce:
1. Auto install RHEV-H with "management_server=$RHEV-M_IP" parameter.
2. Reboot RHEV-H and check rhevm side to make sure that Automatic Register to RHEVM 3.1

Actual results:
1.vdsm-config: starting                                                          
Generating RHEV agent configuration files                                      
RHEV agent configuration files already exist.                                  
                                                                                
Configuring the RHEV Manager connection.                                        
Traceback (most recent call last):                                              
  File "/usr/share/vdsm-reg/deployUtil.py", line 1555, in <module>              
  File "/usr/share/vdsm-reg/deployUtil.py", line 1522, in main                  
  File "/usr/share/vdsm-reg/deployUtil.py", line 1453, in nodeCleanup          
  File "/usr/share/vdsm-reg/deployUtil.py", line 1444, in _nodeBackupCerts      
  File "/usr/lib64/python2.6/shutil.py", line 95, in copy2                      
  File "/usr/lib64/python2.6/shutil.py", line 50, in copyfile                  
IOError: [Errno 2] No such file or directory: '/etc/pki/vdsm/certs/cacert.pem'  
No management_server_fingerprint found.                                        
File already persisted: /etc/vdsm-reg/vdsm-reg.conf                            
                                                                                vdsm-config: ended.                                                            
Finalizing Install and Rebooting (this may take a minute)  

2.After step 2, rhevm automatic register to rhevm3.1 successfully,but it failed to approve rhevh to up.

Expected results:
After step 2, rhevm automatic register to rhevm3.1 successfully,and approve rhevh to up.

Additional info: 
------
Comment 2 haiyang,dong 2012-09-11 07:43:29 EDT
Created attachment 611737 [details]
attached vdsm-reg.log and engine.log
Comment 3 Mike Burns 2012-09-11 08:22:09 EDT
FWIW, this appears to happen every time you autoinstall with management server.

Tried:  

management_server=<hostname>:80
management_server=<hostname>:443
management_server=<hostname>:443 management_server_fingerprint=<fingerprint>

Workaround:  After installation, login as admin, navigate to the RHEVM configuration screen, choose apply, then approve in RHEVM Web UI.
Comment 4 Mike Burns 2012-09-11 08:24:48 EDT
Created attachment 611745 [details]
authorized_keys

The problem is stems from the authorized_keys file for root.  I don't know exactly how it's pulled down yet, but it's pulling html code instead of an ssh key.

Please see attached
Comment 5 Itamar Heim 2012-09-11 09:52:22 EDT
I'm guessing no redirect in servlet for this file.
juan?
Comment 6 Juan Hernández 2012-09-13 06:04:06 EDT
My hypothesis is that the problem is that Apache is smarter than JBoss in this particular case: it is returning a nice HTTP error message when the host tries to connect to an HTTPS port using HTTP. When connecting directly to JBoss (without Apache as proxy) JBoss will just start the SSL handshake, the operation will fail in the host side and then it will try again with HTTPS.

I need to verify this hypothesis and find a solution.
Comment 7 Juan Hernández 2012-09-13 07:40:06 EDT
Can you provide me with the exact kernel options used for the autoinstallation? A complete pxelinux.cfg/default entry would be great.
Comment 8 Mike Burns 2012-09-13 07:55:50 EDT
I don't have a pxe profile handy at the moment, but this will work just the same.

boot from CD/USB.  on boot menu, highlight the install (or resinstall) option and press <TAB>.  add the following:

storage_init=/dev/sda BOOTIF=eth0 adminpw=XXXXXXX management_server=hostname:port


for adminpw, generate the password using 

openssl passwd
Comment 10 Juan Hernández 2012-09-15 05:25:04 EDT
I don't have a solution for the problem, but this i what I found out:

1. The "management_server" parameter has to contain the IP address and the port number, both are mandatory.

2. The "management_server_fingerprint" is also mandatory.

3. When providing these two parameters correctly the CA certificate is downloaded from the engine correctly, and saved to /etc/pki/vdsm/certs/cacert.pem. The original CA certificate is saved and persisted to a backup file in the same directory:

cp cacert.pem bkp-date-cacert.pem

Note that this is the CA certificate, not the VDSM certificate, that one should be pushed by the manage later, after the reboot.

4. The hypervisor reboots automatically.

5. During the boot the vdsmd service it started, and as part of it start script, it validates the VDSM certificate against the CA certificate. This validation fails because the CA certificate is new (downloaded from the engine) but the VDSM certificate is still the original one. When the validation fails the vdsmd start script tries to replace the CA certificate with one in the /etc/pki/vdsm/certs directory. It will find the backup that we did before and use it. So VDSM is now running with its original CA and certificate.

6. The vdsm-reg service starts after that, and it uses the /etc/pki/vdsm/certs/cacert.pem CA certificate for the SSL communication with the engine. This fails because that CA certificate is not the one downloaded from the engine.

7. The vdsm-reg service tries now to register using HTTP instead of HTTPS and it succeeds, so the host is registered in the engine, but not approved.

8. The vdsm-reg service tries to download the SSH public key using SSL and the wrong CA certificate. This fails and vdsm-reg detects it correctly.

9. The vdsm-reg service tries to download the SSH public key using HTTP but using the HTTPS port. It gets the error page from Apache and saves it to the authorized_keys file because it is not validated.

I think there are several things that could potentially need to be fixed here:

A. Change the node so that it interprets correctly the "management_server" parameter even when the port number is not provided.

B. Rethink this VDSM logic that selects one certificate from the /etc/pki/vdsm/certs directory. It is not as easy as removing it, as otherwise VDSM and libvirt (both use the same certificates) will fail to start after the reboot and then vdsm-reg won't be able to create the bridge (it uses libvirt for that).

C. Rethink why the vdsm-reg doesn't download the CA certificate, it used to do it in older versions.

D. Fix deployUtil.py so that when it downloads certificates or SSH keys it validates them before writing to files.

I am not 100% sure this analysis is correct. Suggestions are welcome.
Comment 12 Juan Hernández 2012-09-17 05:36:48 EDT
Whatever the problem is and whatever the solution we implement I think it is not bad to validate the SSH public key before saving it to authorized_keys:

http://gerrit.ovirt.org/8018

I will prepare another patch to verify the CA certificate as well.
Comment 13 Juan Hernández 2012-09-17 06:22:54 EDT
The following change adds the verification of the downloaded CA certificate:

http://gerrit.ovirt.org/8021
Comment 14 Juan Hernández 2012-09-17 07:17:54 EDT
In point A of comment 10 I said that the node needs to parse correctly the "managemen_server" kernel parameter, but I see now that this is done in "vdsm-config" which is part of VDSM. The following patch tries to fix that:

http://gerrit.ovirt.org/8022
Comment 18 Juan Hernández 2012-09-18 13:49:22 EDT
The solution I find for this problem (in addition to the previous three patches) is to download the engine CA certificate to a different file /etc/pki/vdsm/certs/enginecacert.pem. This file isn't touched by the VDSM start script, so it is preserved after the reboot and can be used by vdsm-reg to download the SSH key using HTTPS. The proposed change is here:

http://gerrit.ovirt.org/8038
Comment 44 haiyang,dong 2012-11-02 04:40:20 EDT
Test version:
rhev-hypervisor6-6.3-20121101.0.el6_3.noarch.rpm
vdsm-4.9.6-40.0.el6_3.x86_64
rhevm-3.1.0-23.el6ev
                                           
According to Expected results:
1. Auto install RHEV-H with "management_server=$RHEV-M_IP:[portid]" parameter successfully on rhevh side should success.
2.Also see rhev-h register on RHEVM Web UI, and approve in RHEVM Web UI to make it up success.
Test result:
Tried:   
management_server=$RHEV-M_IP   Pass
management_server=$RHEV-M_IP:443   Pass
management_server=$RHEV-M_IP:443 management_server_fingerprint=<fingerprint> Pass
management_server=$RHEV-M_IP management_server_fingerprint=<fingerprint> Pass

so this bug has been fixed
Comment 45 haiyang,dong 2012-11-02 05:05:51 EDT
In order to verified this bug, could you help change the status into "ON_QA"?
Comment 46 haiyang,dong 2012-11-02 05:27:11 EDT
According to the test result of comment 44, change the status into "VERIFIED"
Comment 48 errata-xmlrpc 2012-12-04 14:11:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html

Note You need to log in before you can comment on or make changes to this bug.