Created attachment 863277 [details] rhevh logs Description of problem: host does not show as pending for approval in RHEVM GUI if installed from PXE with management_server parameter it seems that there is some problem with establishing of SSL connection see additional info Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Manager Version: 3.3.0-0.46.el6ev rhev-hypervisor6-6.5-20140213.0.el6ev How reproducible: 100% Steps to Reproduce: 1. install rhevh with following PXE config, modify it to suit your environment, especially management_server MENU LABEL rhev-hypervisor6-6.5-20140213.0.el6ev TEXT HELP Added on 2014-02-14 ENDTEXT KERNEL images/RHEVH/rhev-hypervisor6-6.5-20140213.0.el6ev/vmlinuz0 APPEND rootflags=loop initrd=images/RHEVH/rhev-hypervisor6-6.5-20140213.0.el6ev/initrd0.img root=live:/rhevh-latest-6.iso rootfstype=auto ro liveimg nomodeset check rootflags=ro crashkernel=512M-2G:64M,2G-:128M elevator=deadline processor.max_cstate=1 rd_NO_LVM rd_NO_LUKS rd_NO_MD rd_NO_DM console=tty0 console=ttyS1,115200n81 firstboot storage_init=/dev/sda storage_vol=::::: ssh_pwauth=1 adminpw=LMi16hIGAvm0A ntp=10.34.32.125 edd=off rhevm_admin_password=gPA37ATxRODnA management_server=mp-rhevm33.rhev.lab.eng.brq.redhat.com IPAPPEND 2 Actual results: host does not show as pending for approval in RHEVM GUI if installed from PXE with management_server parameter Expected results: host shows as pending for approval in RHEVM GUI Additional info: MainThread::DEBUG::2014-02-14 13:30:53,178::deployUtil::1552::root::getRemoteFile start. IP = 10.34.63.69 port = 443 fileName = "/engine.ssh.key.txt" MainThread::DEBUG::2014-02-14 13:30:53,179::deployUtil::1572::root::/engine.ssh.key.txt failed in HTTPS. Retrying using HTTP. Traceback (most recent call last): File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket File "/usr/lib64/python2.6/ssl.py", line 118, in __init__ SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib when certificate from RHEVM is retrieved manually via host TUI and Save & Register is pressed, host appears as pending for approval in RHEVM GUI
Moving to vdsm (or is it o-host-deploy?) because of: Traceback (most recent call last): File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket File "/usr/lib64/python2.6/ssl.py", line 118, in __init__ SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib
When pasting, try to find the entire sequence :) MainThread::DEBUG::2014-02-14 13:30:53,178::deployUtil::1552::root::getRemoteFile start. IP = 10.34.63.69 port = 443 fileName = "/engine.ssh.key.txt" MainThread::DEBUG::2014-02-14 13:30:53,179::deployUtil::1572::root::/engine.ssh.key.txt failed in HTTPS. Retrying using HTTP. Traceback (most recent call last): File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket File "/usr/lib64/python2.6/ssl.py", line 118, in __init__ SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib MainThread::DEBUG::2014-02-14 13:30:53,180::deployUtil::1610::root::getRemoteFile end. MainThread::DEBUG::2014-02-14 13:30:53,181::deployUtil::743::root::validateSSHKey: the string "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> Reason: You're speaking plain HTTP to an SSL-enabled server port.<br /> Instead use the HTTPS scheme to access this URL, please.<br /> <blockquote>Hint: <a href="https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/"><b>https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/</b></a></blockquote></p> <hr> <address>Apache/2.2.22 (Red Hat Enterprise Web Server) Server at mp-rhevm33.rhev.lab.eng.brq.redhat.com Port 443</address> </body></html> " is not a valid SSH key MainThread::DEBUG::2014-02-14 13:30:53,181::deployUtil::1552::root::getRemoteFile start. IP = 10.34.63.69 port = 443 fileName = "/rhevm.ssh.key.txt" MainThread::DEBUG::2014-02-14 13:30:53,181::deployUtil::1572::root::/rhevm.ssh.key.txt failed in HTTPS. Retrying using HTTP. Traceback (most recent call last): File "/usr/share/vdsm-reg/deployUtil.py", line 1567, in getRemoteFile File "/usr/share/vdsm-reg/deployUtil.py", line 1387, in getSSLSocket File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket File "/usr/lib64/python2.6/ssl.py", line 118, in __init__ SSLError: [Errno 185090050] _ssl.c:330: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib MainThread::DEBUG::2014-02-14 13:30:53,183::deployUtil::1610::root::getRemoteFile end. MainThread::DEBUG::2014-02-14 13:30:53,183::deployUtil::743::root::validateSSHKey: the string "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> Reason: You're speaking plain HTTP to an SSL-enabled server port.<br /> Instead use the HTTPS scheme to access this URL, please.<br /> <blockquote>Hint: <a href="https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/"><b>https://mp-rhevm33.rhev.lab.eng.brq.redhat.com/</b></a></blockquote></p> <hr> <address>Apache/2.2.22 (Red Hat Enterprise Web Server) Server at mp-rhevm33.rhev.lab.eng.brq.redhat.com Port 443</address> </body></html> " is not a valid SSH key
Hi, Are you sure that you remove the ca certificate before switching into different hosts? My guess is that the ca certificate is not downloaded because it already exists at machine. Thanks,
(In reply to Alon Bar-Lev from comment #3) > Hi, > > Are you sure that you remove the ca certificate before switching into > different hosts? > > My guess is that the ca certificate is not downloaded because it already > exists at machine. > > Thanks, It should not be the case, the host is cleanly installed, how could it already have the certificate?
Well, this is amazing, had it ever worked?!?! vdsm-reg does even try to download the web ca certificate! Douglas? vdsm_reg/vdsm-reg-setup.in --- def execute(self): fOK = True fOKNow = True logging.debug("execute start.") self.registered = False if deployUtil.preventDuplicate(): logging.debug("execute: found existing management bridge. Skipping rename.") else: fOK = self.renameBridge() logging.debug("execute: after renameBridge: %s", fOK) if fOK: strKey = deployUtil.getAuthKeysFile(self.vdcURL, self.vdcPORT) if strKey is not None: fOKNow = deployUtil.handleSSHKey(strKey) else: fOKNow = False fOK = fOK and fOKNow logging.debug("execute: after getAuthKeysFile: %s", fOK) if fOK: fOKNow = self.registerVDS() fOK = fOK and fOKNow logging.debug("execute: after registerVDS: %s", fOK) if fOK: self.registered = True ---
Hi, (In reply to Alon Bar-Lev from comment #5) > Well, this is amazing, had it ever worked?!?! > > vdsm-reg does even try to download the web ca certificate! It tried but based on the logs it got: Traceback (most recent call last): File "/usr/share/vdsm-reg/deployUtil.py", line 1565, in getRemoteFile File "<string>", line 1, in connect error: [Errno 101] Network is unreachable After network is available, we can collect anything from engine, as showed below from Martin's machine. # python Python 2.6.6 (r266:84292, Nov 21 2013, 10:50:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import deployUtil >>> >>> deployUtil.getRemoteFile("10.XX.XX.69", "443", "/engine.ssh.key.txt") 'ssh-rsa AAAAB3MrbC1yc2EAAAADAQABAAABAQDOnLeyKZqDk6sPERDhMM44SggXSIgdz+S5cv4mSddMtc3FQeAcTen8JiUM2vnGLvVVDCxg6WN2Ry1ch2CMOM8PowZVV88gXsWvetMTwVRt6muKUhxW+aTGSUT7gDxxNJKIAR1qjcUt8RojMO+LH7juiEk8fFf76e4cyJY2ftPxm1RGvmEDURGNLhlDn/9BLVj0rG3HL9/ZNE29TV6QNak74J9ZnCIATolp4EPZrpDAYziEsj2l2oEOlQG3xK3Zf7WCFWc42jbruPy2cx7HR0M3QPqv3W7P5xXRMuKEZoP0P2c4CckXJyQxPZMeF85NvjcFYRiiU8A6TAKFDOFyh3M7 ovirt-engine\n' Also, talked with Martin over IRC and he changed the dhcp to static ip address over PXE and it worked out of box. Anyway, I see we have blockingdhcp as true on vdsm-reg logs, so we wait dhcp server too: MainThread::DEBUG::2014-02-14 13:16:29,619::deployUtil::136::root::['/usr/share/vdsm/addNetwork', 'rhevm', '', '', 'eth0', 'BOOTPROTO=dhcp', 'ONBOOT=yes', 'PEERNTP=yes', 'blockingdhcp=true'] On my test enviroment it just works, here my steps on a clean environment based on libvirt: # yum install tftp-server # virsh net-destroy default # virsh net-edit default Added <bootp file='pxelinux.0' /> # virsh net-start default # service libvirtd restart # livecd-iso-to-pxeboot ./rhevh-6.5-20140213.0.el6ev.iso Copied tftpboot/ subdirectory to /var/lib/tftpboot - Setup iptables to allow dhcp iptables -I INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -I INPUT 1 -m mac --mac-source 52:54:00:d4:74:1e -j ACCEPT Changed /var/lib/tftp/pxelinux.cfg/default to: DEFAULT pxeboot TIMEOUT 20 PROMPT 0 LABEL pxeboot KERNEL vmlinuz0 APPEND rootflags=loop initrd=initrd0.img root=live:/rhevh-6.5-20140213.0.el6ev.iso rootfstype=auto ro liveimg check RD_NO_LVM rd_NO_MULTIPATH rootflags=ro crashkernel=128M elevator=deadline install quiet max_loop=256 rhgb rd_NO_LUKS rd_NO_MD rd_NO_DM firstboot storage_init=/dev/sda storage_vol=::::: ssh_pwauth=1 adminpw=LMi16hIGAvm0A management_server=192.168.0.5 IPAPPEND 2 Seems very specific network case since we have a plenty of users and this is single report that I am aware. Dan, any comment?
douglas, please read comment#5. it has nothing to do with dhcp as vdsm-reg retries. the problem is that vdsm-reg never tries to download the ca certificate, see the code.
How come we do not collect /var/log/vdsm-config ? It must be added to the log collector or moved under /var/log/vdsm-reg. Martin, could you find and attach your own? Alon, I'm not 100% I understand what you're saying, but it could be that you managed to block some of vdsm-reg insanity: vdsm-config is placed under /etc/ovirt-config-boot.d and should have called getRhevmCert() on ovirt-node boot time. Douglas, could you find out if it was run, and whether it succeeded to fetch /etc/pki/vdsm/certs/engine_web_ca.pem ?
(In reply to Dan Kenigsberg from comment #8) > Alon, I'm not 100% I understand what you're saying, but it could be that you > managed to block some of vdsm-reg insanity: vdsm-config is placed under > /etc/ovirt-config-boot.d and should have called getRhevmCert() on ovirt-node > boot time. OWOWOWOWOWO! I was not aware of this script!!! So the answer is quite clear... this has no retry... right? so network is unavailable at this early stage, we can see it from vdsm-reg that network is established only later. Why isn't vdsm-reg download the certificate?
(In reply to Dan Kenigsberg from comment #8) > > Douglas, could you find out if it was run, and whether it succeeded to fetch > /etc/pki/vdsm/certs/engine_web_ca.pem ? I verified, there is no certificate at all. I managed to download it during my tests inside the ovirt-node.
this is when I use static IP [root@dell-r210ii-06 ~]# cat /var/log/vdsm-config vdsm-config: starting RHEV agent configuration files already exist. nDisable: 0 checkpoint 1 checkpoint 2 ::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com checkpoint 3::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com, management_port: 443 Node clean-up successful RHEV-M certificate downloaded and verified successfully. rhevm_admin_password: gPA37ATxRODnA vdsm-config: ended. this is with DHCP [root@dell-r210ii-06 ~]# cat /var/log/vdsm-config vdsm-config: starting RHEV agent configuration files already exist. nDisable: 0 checkpoint 1 checkpoint 2 ::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com checkpoint 3::management_server: mp-rhevm33.rhev.lab.eng.brq.redhat.com, management_port: 443 Node clean-up successful Failed downloading the RHEV-M certificate file Rebooting ... rhevm_admin_password: gPA37ATxRODnA vdsm-config: ended.
Alon, in case your question was not rhetorical, the only reason that vdsm-config does downloading is bad implementation. vdsm-config should only make the fingerprint available to vdsm-reg (only because I am not sure whether first-boot kernel params are available to vdsm-reg directly). Downloading should be done by vdsm-reg itself (unless the web cert was already authorized in the TUI).
The vdsm-config can put these parameters into vdsm-reg.conf... then when vdsm-reg starts it can download the certificate if not exists and parameters available. My next task will be to remove vdsm-reg... finish with priority queue.
IIUC, this is a bug, no user experience advice is needed here -> removing the UserExperience Keyword.
Moving to modified since we have the patch merged in master and 3.4 branch. Next rebase should appear in downstream.
why is this on_QA? bug is fixed in vdsm-4.14.7-4.el6ev RHEV-H provided with av10 does not have it see http://bob.eng.lab.tlv.redhat.com/builds/latest_av/av10_rhevh_version_info.html vdsm 4.14.7-3.el6ev.x86_64 4.14.7-3.el6ev.x86_64
rhev-integ , why was this moved back to on_QA?
verified with RHEV Hypervisor - 6.5 - 20140707.0.el6ev
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0970.html