Bug 1065368 - [RHEVH] if installed from PXE with management_server parameter, host does not show as pending for approval in RHEVM GUI
Summary: [RHEVH] if installed from PXE with management_server parameter, host does not...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 3.4.1
Assignee: Douglas Schilling Landgraf
QA Contact: Martin Pavlik
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-14 13:43 UTC by Martin Pavlik
Modified: 2016-02-10 19:11 UTC (History)
19 users (show)

Fixed In Version: vdsm-4.14.7-4.el6ev
Doc Type: Bug Fix
Doc Text:
Previously, in some instances where a hypervisor was installed from PXE with the "management_server" parameter, that host did not show as pending approval in the Red Hat Enterprise Virtualization Manager Administration Portal. This happened because a delay in delivering the engine IP address caused the certificate download from the vdsm-config stage to fail. Now, a patch moves the download certificate from vdsm-config to vdsm-reg-setup, and the certificate is downloaded and verified successfully.
Clone Of:
Environment:
Last Closed: 2014-07-29 14:19:57 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rhevh logs (87.66 KB, application/x-compressed-tar)
2014-02-14 13:43 UTC, Martin Pavlik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0970 0 normal SHIPPED_LIVE vdsm 3.4.1 bug fix and enhancement update 2014-07-29 18:18:39 UTC
oVirt gerrit 26718 0 None None None Never
oVirt gerrit 28164 0 None None None Never

Description Martin Pavlik 2014-02-14 13:43:57 UTC
Created attachment 863277 [details]
rhevh logs

Description of problem:
host does not show as pending for approval in RHEVM GUI if installed from PXE with management_server parameter

it seems that there is some problem with establishing of SSL connection see additional info


Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Manager Version: 3.3.0-0.46.el6ev 
rhev-hypervisor6-6.5-20140213.0.el6ev

How reproducible:
100%

Steps to Reproduce:
1. install rhevh with following PXE config, modify it to suit your environment, especially management_server

MENU LABEL rhev-hypervisor6-6.5-20140213.0.el6ev
TEXT HELP
Added on 2014-02-14
ENDTEXT
    KERNEL images/RHEVH/rhev-hypervisor6-6.5-20140213.0.el6ev/vmlinuz0
    APPEND rootflags=loop initrd=images/RHEVH/rhev-hypervisor6-6.5-20140213.0.el6ev/initrd0.img root=live:/rhevh-latest-6.iso rootfstype=auto ro liveimg nomodeset check rootflags=ro crashkernel=512M-2G:64M,2G-:128M elevator=deadline processor.max_cstate=1 rd_NO_LVM rd_NO_LUKS rd_NO_MD rd_NO_DM console=tty0 console=ttyS1,115200n81 firstboot storage_init=/dev/sda storage_vol=::::: ssh_pwauth=1 adminpw=LMi16hIGAvm0A ntp=10.34.32.125 edd=off rhevm_admin_password=gPA37ATxRODnA management_server=mp-rhevm33.rhev.lab.eng.brq.redhat.com
    IPAPPEND 2

Actual results:
host does not show as pending for approval in RHEVM GUI if installed from PXE with management_server parameter

Expected results:
host shows as pending for approval in RHEVM GUI

Additional info:

MainThread::DEBUG::2014-02-14 13:30:53,178::deployUtil::1552::root::getRemoteFile start. IP = 10.34.63.69 port = 443 fileName = "/engine.ssh.key.txt"
MainThread::DEBUG::2014-02-14 13:30:53,179::deployUtil::1572::root::/engine.ssh.key.txt failed in HTTPS. Retrying using HTTP.
Traceback (most recent call last):
  File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile
  File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket
  File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
  File "/usr/lib64/python2.6/ssl.py", line 118, in __init__
SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib


when certificate from RHEVM is retrieved manually via host TUI 
and 
Save & Register is pressed, host appears as pending for approval in RHEVM GUI

Comment 1 Fabian Deutsch 2014-02-14 17:02:26 UTC
Moving to vdsm (or is it o-host-deploy?) because of:

Traceback (most recent call last):
  File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile
  File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket
  File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
  File "/usr/lib64/python2.6/ssl.py", line 118, in __init__
SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib

Comment 2 Alon Bar-Lev 2014-02-19 15:41:15 UTC
When pasting, try to find the entire sequence :)

MainThread::DEBUG::2014-02-14 13:30:53,178::deployUtil::1552::root::getRemoteFile start. IP = 10.34.63.69 port = 443 fileName = "/engine.ssh.key.txt"
MainThread::DEBUG::2014-02-14 13:30:53,179::deployUtil::1572::root::/engine.ssh.key.txt failed in HTTPS. Retrying using HTTP.
Traceback (most recent call last):
  File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile
  File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket
  File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
  File "/usr/lib64/python2.6/ssl.py", line 118, in __init__
SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib
MainThread::DEBUG::2014-02-14 13:30:53,180::deployUtil::1610::root::getRemoteFile end.
MainThread::DEBUG::2014-02-14 13:30:53,181::deployUtil::743::root::validateSSHKey: the string "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
Reason: You're speaking plain HTTP to an SSL-enabled server port.<br />
Instead use the HTTPS scheme to access this URL, please.<br />
<blockquote>Hint: <a href="https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/"><b>https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/</b></a></blockquote></p>
<hr>
<address>Apache/2.2.22 (Red Hat Enterprise Web Server) Server at mp-rhevm33.rhev.lab.eng.brq.redhat.com Port 443</address>
</body></html>
" is not a valid SSH key
MainThread::DEBUG::2014-02-14 13:30:53,181::deployUtil::1552::root::getRemoteFile start. IP = 10.34.63.69 port = 443 fileName = "/rhevm.ssh.key.txt"
MainThread::DEBUG::2014-02-14 13:30:53,181::deployUtil::1572::root::/rhevm.ssh.key.txt failed in HTTPS. Retrying using HTTP.
Traceback (most recent call last):
  File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile
  File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket
  File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
  File "/usr/lib64/python2.6/ssl.py", line 118, in __init__
SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib
MainThread::DEBUG::2014-02-14 13:30:53,183::deployUtil::1610::root::getRemoteFile end.
MainThread::DEBUG::2014-02-14 13:30:53,183::deployUtil::743::root::validateSSHKey: the string "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
Reason: You're speaking plain HTTP to an SSL-enabled server port.<br />
Instead use the HTTPS scheme to access this URL, please.<br />
<blockquote>Hint: <a href="https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/"><b>https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/</b></a></blockquote></p>
<hr>
<address>Apache/2.2.22 (Red Hat Enterprise Web Server) Server at mp-rhevm33.rhev.lab.eng.brq.redhat.com Port 443</address>
</body></html>
" is not a valid SSH key

Comment 3 Alon Bar-Lev 2014-02-19 15:46:25 UTC
Hi,

Are you sure that you remove the ca certificate before switching into different hosts?

My guess is that the ca certificate is not downloaded because it already exists at machine.

Thanks,

Comment 4 Martin Pavlik 2014-02-19 16:16:41 UTC
(In reply to Alon Bar-Lev from comment #3)
> Hi,
> 
> Are you sure that you remove the ca certificate before switching into
> different hosts?
> 
> My guess is that the ca certificate is not downloaded because it already
> exists at machine.
> 
> Thanks,

It should not be the case, the host is cleanly installed, how could it already have the certificate?

Comment 5 Alon Bar-Lev 2014-02-19 18:51:53 UTC
Well, this is amazing, had it ever worked?!?!

vdsm-reg does even try to download the web ca certificate!

Douglas?

vdsm_reg/vdsm-reg-setup.in
---
    def execute(self):
        fOK = True
        fOKNow = True
        logging.debug("execute start.")
        self.registered = False

        if deployUtil.preventDuplicate():
            logging.debug("execute: found existing management bridge. Skipping rename.")
        else:
            fOK = self.renameBridge()
            logging.debug("execute: after renameBridge: %s", fOK)

        if fOK:
            strKey = deployUtil.getAuthKeysFile(self.vdcURL, self.vdcPORT)
            if strKey is not None:
                fOKNow = deployUtil.handleSSHKey(strKey)
            else:
                fOKNow = False
            fOK = fOK and fOKNow
            logging.debug("execute: after getAuthKeysFile: %s", fOK)

        if fOK:
            fOKNow = self.registerVDS()
            fOK = fOK and fOKNow
            logging.debug("execute: after registerVDS: %s", fOK)

        if fOK:
            self.registered = True
---

Comment 6 Douglas Schilling Landgraf 2014-02-20 20:01:18 UTC
Hi,

(In reply to Alon Bar-Lev from comment #5)
> Well, this is amazing, had it ever worked?!?!
> 
> vdsm-reg does even try to download the web ca certificate!

It tried but based on the logs it got:
Traceback (most recent call last):
File "/usr/share/vdsm-reg/deployUtil.py", line 1565, in getRemoteFile
File "<string>", line 1, in connect
error: [Errno 101] Network is unreachable

After network is available, we can collect anything from engine, as showed below from Martin's machine.

# python
Python 2.6.6 (r266:84292, Nov 21 2013, 10:50:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import deployUtil
>>> 
>>> deployUtil.getRemoteFile("10.XX.XX.69", "443", "/engine.ssh.key.txt")
'ssh-rsa AAAAB3MrbC1yc2EAAAADAQABAAABAQDOnLeyKZqDk6sPERDhMM44SggXSIgdz+S5cv4mSddMtc3FQeAcTen8JiUM2vnGLvVVDCxg6WN2Ry1ch2CMOM8PowZVV88gXsWvetMTwVRt6muKUhxW+aTGSUT7gDxxNJKIAR1qjcUt8RojMO+LH7juiEk8fFf76e4cyJY2ftPxm1RGvmEDURGNLhlDn/9BLVj0rG3HL9/ZNE29TV6QNak74J9ZnCIATolp4EPZrpDAYziEsj2l2oEOlQG3xK3Zf7WCFWc42jbruPy2cx7HR0M3QPqv3W7P5xXRMuKEZoP0P2c4CckXJyQxPZMeF85NvjcFYRiiU8A6TAKFDOFyh3M7 ovirt-engine\n'

Also, talked with Martin over IRC and he changed the dhcp to static ip address over PXE and it worked out of box. 

Anyway, I see we have blockingdhcp as true on vdsm-reg logs, so we wait dhcp server too:
MainThread::DEBUG::2014-02-14 13:16:29,619::deployUtil::136::root::['/usr/share/vdsm/addNetwork', 'rhevm', '', '', 'eth0', 'BOOTPROTO=dhcp', 'ONBOOT=yes', 'PEERNTP=yes', 'blockingdhcp=true']

On my test enviroment it just works, here my steps on a clean environment based on libvirt:

# yum install tftp-server 
# virsh net-destroy default
# virsh net-edit default
Added  <bootp file='pxelinux.0' />

# virsh net-start default
# service libvirtd restart

# livecd-iso-to-pxeboot ./rhevh-6.5-20140213.0.el6ev.iso 
Copied tftpboot/ subdirectory to /var/lib/tftpboot

- Setup iptables to allow dhcp
iptables -I INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -I INPUT 1 -m mac --mac-source 52:54:00:d4:74:1e -j ACCEPT

Changed /var/lib/tftp/pxelinux.cfg/default to:
DEFAULT pxeboot
TIMEOUT 20
PROMPT 0
LABEL pxeboot
        KERNEL vmlinuz0
        APPEND rootflags=loop initrd=initrd0.img root=live:/rhevh-6.5-20140213.0.el6ev.iso rootfstype=auto ro liveimg check RD_NO_LVM rd_NO_MULTIPATH rootflags=ro crashkernel=128M elevator=deadline install quiet max_loop=256 rhgb rd_NO_LUKS rd_NO_MD rd_NO_DM firstboot storage_init=/dev/sda storage_vol=::::: ssh_pwauth=1 adminpw=LMi16hIGAvm0A management_server=192.168.0.5
IPAPPEND 2

Seems very specific network case since we have a plenty of users and this is single report that I am aware. Dan, any comment?

Comment 7 Alon Bar-Lev 2014-02-20 20:03:15 UTC
douglas, please read comment#5.

it has nothing to do with dhcp as vdsm-reg retries.

the problem is that vdsm-reg never tries to download the ca certificate, see the code.

Comment 8 Dan Kenigsberg 2014-02-20 22:56:36 UTC
How come we do not collect /var/log/vdsm-config ? It must be added to the log collector or moved under /var/log/vdsm-reg. Martin, could you find and attach your own?

Alon, I'm not 100% I understand what you're saying, but it could be that you managed to block some of vdsm-reg insanity: vdsm-config is placed under /etc/ovirt-config-boot.d and should have called getRhevmCert() on ovirt-node boot time.

Douglas, could you find out if it was run, and whether it succeeded to fetch /etc/pki/vdsm/certs/engine_web_ca.pem ?

Comment 9 Alon Bar-Lev 2014-02-20 23:02:50 UTC
(In reply to Dan Kenigsberg from comment #8)
> Alon, I'm not 100% I understand what you're saying, but it could be that you
> managed to block some of vdsm-reg insanity: vdsm-config is placed under
> /etc/ovirt-config-boot.d and should have called getRhevmCert() on ovirt-node
> boot time.

OWOWOWOWOWO! I was not aware of this script!!!

So the answer is quite clear... this has no retry... right? so network is unavailable at this early stage, we can see it from vdsm-reg that network is established only later.

Why isn't vdsm-reg download the certificate?

Comment 10 Douglas Schilling Landgraf 2014-02-21 04:19:28 UTC
(In reply to Dan Kenigsberg from comment #8)

> 
> Douglas, could you find out if it was run, and whether it succeeded to fetch
> /etc/pki/vdsm/certs/engine_web_ca.pem ?

I verified, there is no certificate at all. I managed to download it during my tests inside the ovirt-node.

Comment 11 Martin Pavlik 2014-02-21 07:52:48 UTC
this is when I use static IP

[root@dell-r210ii-06 ~]# cat /var/log/vdsm-config 
vdsm-config: starting
RHEV agent configuration files already exist.
nDisable: 0
checkpoint 1
checkpoint 2 ::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com
checkpoint 3::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com, management_port: 443
Node clean-up successful
RHEV-M certificate downloaded and verified successfully.
rhevm_admin_password: gPA37ATxRODnA
vdsm-config: ended.


this is with DHCP
[root@dell-r210ii-06 ~]# cat /var/log/vdsm-config 
vdsm-config: starting
RHEV agent configuration files already exist.
nDisable: 0
checkpoint 1
checkpoint 2 ::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com
checkpoint 3::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com, management_port: 443
Node clean-up successful
Failed downloading the RHEV-M certificate file
Rebooting ... 
rhevm_admin_password: gPA37ATxRODnA
vdsm-config: ended.

Comment 12 Dan Kenigsberg 2014-02-21 09:46:30 UTC
Alon, in case your question was not rhetorical, the only reason that vdsm-config does downloading is bad implementation.

vdsm-config should only make the fingerprint available to vdsm-reg (only because I am not sure whether first-boot kernel params are available to vdsm-reg directly). Downloading should be done by vdsm-reg itself (unless the web cert was already authorized in the TUI).

Comment 13 Alon Bar-Lev 2014-02-21 09:48:54 UTC
The vdsm-config can put these parameters into vdsm-reg.conf... then when vdsm-reg starts it can download the certificate if not exists and parameters available.

My next task will be to remove vdsm-reg... finish with priority queue.

Comment 15 Einav Cohen 2014-05-12 18:40:38 UTC
IIUC, this is a bug, no user experience advice is needed here -> removing the UserExperience Keyword.

Comment 16 Douglas Schilling Landgraf 2014-06-02 13:13:42 UTC
Moving to modified since we have the patch merged in master and 3.4 branch.
Next rebase should appear in downstream.

Comment 17 Martin Pavlik 2014-06-26 12:44:09 UTC
why is this on_QA?

bug is fixed in vdsm-4.14.7-4.el6ev 

RHEV-H provided with av10 does not have it

see
http://bob.eng.lab.tlv.redhat.com/builds/latest_av/av10_rhevh_version_info.html

vdsm	4.14.7-3.el6ev.x86_64	4.14.7-3.el6ev.x86_64

Comment 18 Martin Pavlik 2014-06-30 12:50:29 UTC
rhev-integ , why was this moved back to on_QA?

Comment 20 Martin Pavlik 2014-07-10 09:16:49 UTC
verified with RHEV Hypervisor - 6.5 - 20140707.0.el6ev

Comment 22 errata-xmlrpc 2014-07-29 14:19:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0970.html


Note You need to log in before you can comment on or make changes to this bug.