Created attachment 1061022 [details] Displayed error Description of problem: I am trying to install a hosted engine host using ovirt-hosted-engine-setup-1.3.0-0.0.master.20150729070044 (git 26149d7) and ovirt-engine-appliance-20150802.0-1 on RHEL 7.1 host. The setup fails after I fill in all the details, complaining about localhost.localdomain even though I entered the real FQDN as you can see in the attached screenshot. How reproducible: Always Steps to Reproduce: 1. Start the installation using ovirt-hosted-engine-setup 2. Enter the usual values (storage, disk boot, OVA image 3. Enter fqdn (he-vm04.rhev.lab.eng.brq.redhat.com in my case) 4. Answer all the other questions.. Actual results: localhost.localdomain related error Expected results: the setup continues and installs the hosted engine Additional info:
Created attachment 1061024 [details] Log file
The error is probably not that clear but the issue is relative to the host hostname which is localhost.localdomain and so it will not be uniquely resolvable by the engine VM. Please see: https://bugzilla.redhat.com/show_bug.cgi?id=1178535#c10 As Martin pointed out, now the user could also add additional host from the web ui just using its IP address and so we have to review that decision.
Thanks for clarification. The host's hostname is localhost.localdomain indeed. I see couple of things that we should do here: 1) Improve the error reporting - I had no idea the setup tries to resolve host's name before Simone told me (we did not do that in the past and it is not obvious from the log file) 2) Use socket.getfqdn() in the code that tries to do the resolving as Python documentation states: Note: gethostname() doesn’t always return the fully qualified domain name; use getfqdn() 3) Give the user a chance to review and change the hostname (/etc/hostname does not have to be the name for the host in the DNS system) 4) Allow using the IP directly (we still support that in the ovirt-engine) 5) Warn the user that localhost will cause trouble with migrations as libvirt specifically checks for that name (it should not as vdsm provide all the extra information needed, but it does and we have to live with it atm). Libvirt does not require that the hostname is resolvable, it just has to be different from localhost.
Can you please open a separate RFE for each of the above request and block this one as a tracker?
Can you please open a bug on engine and the way we determine the host name?
Yaniv: This is all related to a single code block in the hosted engine setup. We do not even have UI for that part (3rd point). It is up to Simone if he wants to track the ideas separately, but I guess all will end up in the same patchset anyway..
This is an automated message. oVirt 3.6.0 RC1 has been released. This bug has no target release and still have target milestone set to 3.6.0-rc. Please review this bug and set target milestone and release to one of the next releases.
Postponing since it's not a 3.6.0 blocker.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
(In reply to Yaniv Dary from comment #5) > Can you please open a bug on engine and the way we determine the host name? Yaniv, needinfo on me was cleared by Martin on comment #6. I'm also not sure to understand what I should ask on the bug other than what already mentioned in this bug itself.
Wouldn't this bug be solved by fixing bug 1188675 ?
*** Bug 1188675 has been marked as a duplicate of this bug. ***
Probably it's better to focus on the requirement before taking any action: if we allow the user to enter custom values we have to somehow validate. hosted-engine setup is calling host.add on the REST API to add the host where is running on to the engine. The host address is one of the parameters of that call so all the point is how to validate it to exclude further issues. In 3.4 we were adding the host using the IP address of the interface were hosted-engine-setup created the management bridge on as the host address. We changed it for two reasons: - showing just that in hosted-engine --status is probably less usable than showing the hostname - we found that we add an issue with live migrations cause hosted-engine-setup was temporary generating generating vdsm certs with the hostname and than host-deploy was overwriting with the address we passed to host.add: https://bugzilla.redhat.com/show_bug.cgi?id=1178535#c10 so if we allow the user to customize it we have also to fix there to avoid it again. so: - localhost.localdomain is of-course not valid and the address should be well-formed - an IP address is acceptable? should it match with the management bridge IP? what if the host has a different network for migration?
(In reply to Simone Tiraboschi from comment #13) > Probably it's better to focus on the requirement before taking any action: > if we allow the user to enter custom values we have to somehow validate. > > hosted-engine setup is calling host.add on the REST API to add the host > where is running on to the engine. > The host address is one of the parameters of that call so all the point is > how to validate it to exclude further issues. > > In 3.4 we were adding the host using the IP address of the interface were > hosted-engine-setup created the management bridge on as the host address. > > We changed it for two reasons: > - showing just that in hosted-engine --status is probably less usable than > showing the hostname > - we found that we add an issue with live migrations cause > hosted-engine-setup was temporary generating generating vdsm certs with the > hostname and than host-deploy was overwriting with the address we passed to > host.add: https://bugzilla.redhat.com/show_bug.cgi?id=1178535#c10 so if we > allow the user to customize it we have also to fix there to avoid it again. > > so: > - localhost.localdomain is of-course not valid and the address should be > well-formed > - an IP address is acceptable? should it match with the management bridge > IP? what if the host has a different network for migration? We should require DNS resolvable FQDN. IP should not be supported.
libvirt seams quite sensitive on TLS CN verification: [root@c71het20151028 ~]# virsh -c qemu+tls://c71het20151028/system 2015-10-28 16:36:02.248+0000: 13049: info : libvirt version: 1.2.8, package: 16.el7_1.4 (CentOS BuildSystem <http://bugs.centos.org>, 2015-09-15-14:00:05, worker1.bsys.centos.org) 2015-10-28 16:36:02.248+0000: 13049: warning : virNetTLSContextCheckCertificate:1145 : Certificate check failed Certificate [session] owner does not match the hostname c71het20151028 error: failed to connect to the hypervisor error: authentication failed: Failed to verify peer's certificate [root@c71het20151028 ~]# virsh -c qemu+tls://c71het20151028.localdomain/system Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # quit We need to properly understand this https://bugzilla.redhat.com/show_bug.cgi?id=1178535 a bit better otherwise I feel it will happen again if we allow the user to use custom values.
> We should require DNS resolvable FQDN. IP should not be supported. Why? It is still supported by the engine.
(In reply to Martin Sivák from comment #16) > > We should require DNS resolvable FQDN. IP should not be supported. > > Why? It is still supported by the engine. The fact it might happen to work, doesn't mean it is the design. DNS resolvable FQDN is what we make sure to work.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
This RFE is also required when we have multiple FQDNs for host, and need to specify the FQDN to use during additional host deployment - otherwise the bridge ends up being created on wrong interface. See bug 1326709
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Re-targeting to 3.6.7 for Gluster sake.
We have a workaround - ensuring that hostname correctly resolves to required FQDN before deploying HE. - so we can retarget back to 4.0 if there's a bandwidth constraint
It's basically ready.
1)Can you please provide desirable reproduction steps for this bug? 2)Current status is as follows: HE deployment using rhevm-appliance-20160515.0-1.el7ev.noarch on NFS has succeeded, clean host booted with properly assigned FQDN, so I'm not sure that this reproduction is sufficient. Components on host: ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 mom-0.5.3-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.29-0.el7ev.noarch rhevm-sdk-python-3.6.5.1-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-hosted-engine-ha-1.3.5.6-1.el7ev.noarch ovirt-vmconsole-host-1.0.2-2.el7ev.noarch rhevm-appliance-20160515.0-1.el7ev.noarch Linux version 3.10.0-327.22.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon May 16 13:31:48 EDT 2016 Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux 3.10.0-327.22.1.el7.x86_64 #1 SMP Mon May 16 13:31:48 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Engine: rhevm-cli-3.6.2.1-1.el6ev.noarch rhevm-dwh-setup-3.6.6-1.el6ev.noarch rhevm-userportal-3.6.7-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.7-0.1.el6.noarch rhevm-extensions-api-impl-3.6.7-0.1.el6.noarch rhevm-tools-backup-3.6.7-0.1.el6.noarch rhevm-dbscripts-3.6.7-0.1.el6.noarch rhevm-backend-3.6.7-0.1.el6.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-lib-3.6.7-0.1.el6.noarch rhevm-setup-base-3.6.7-0.1.el6.noarch rhevm-setup-plugin-websocket-proxy-3.6.7-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.7-0.1.el6.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch rhevm-reports-setup-3.6.5.1-1.el6ev.noarch rhevm-webadmin-portal-3.6.7-0.1.el6.noarch rhevm-3.6.7-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.7-0.1.el6.noarch rhevm-setup-3.6.7-0.1.el6.noarch rhevm-doc-3.6.7-1.el6eng.noarch rhevm-reports-3.6.5.1-1.el6ev.noarch rhevm-tools-3.6.7-0.1.el6.noarch rhevm-websocket-proxy-3.6.7-0.1.el6.noarch rhevm-dwh-3.6.6-1.el6ev.noarch rhevm-restapi-3.6.7-0.1.el6.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.7-0.1.el6.noarch rhevm-sdk-python-3.6.5.1-1.el6ev.noarch Red Hat Enterprise Linux Server release 6.8 (Santiago) Linux version 2.6.32-642.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Wed Apr 13 00:51:26 EDT 2016 Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux The rhevm-appliance-20160515.0-1.el7ev.noarch comes with Red Hat Enterprise Virtualization Manager Version: 3.6.6.2-0.1.el6, so I had to update the engine right after it's deployment had been finished, I've also added reports and dwh, rhevm-dwh.noarch 0:3.6.6-1.el6ev, and ovirt-vmconsole-proxy-1.0.2-2.el6ev.noarch, while host was set to global maintenance and after engine was upgraded, I've reactivated the host back.
(In reply to Nikolai Sednev from comment #24) > 1)Can you please provide desirable reproduction steps for this bug? Deploy the first host and the engine using the appliance, try adding an additional host with 'hosted-engine --deploy': now the script will let you validate the host address so: 1. ensure that the proposed valued is correct 2. try replacing it with 'localhost.localdomain'
(In reply to Simone Tiraboschi from comment #25) > (In reply to Nikolai Sednev from comment #24) > > 1)Can you please provide desirable reproduction steps for this bug? > > Deploy the first host and the engine using the appliance, try adding an > additional host with 'hosted-engine --deploy': now the script will let you > validate the host address so: > 1. ensure that the proposed valued is correct > 2. try replacing it with 'localhost.localdomain' The initial deployment on first host using appliance was made as described in comment #24 and was successful. Addition of addition host with changed /etc/hostname to localhost.localdomain and hostnamectl set-hostname localhost.localdomain revealed that deployment warns customer about problematic FQDN and it's resolution issue, so then if IP address given, that also does not resolve the issue and eventually I've provided proper FQDN of the host and succeeded: [root@alma03 ~]# cat /etc/hostname localhost.localdomain [root@alma03 ~]# hostnamectl set-hostname localhost.localdomain [root@alma03 ~]# hostname localhost.localdomain [root@alma03 ~]# hosted-engine --deploy [ INFO ] Stage: Initializing [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: Configuration files: [] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160530120807-yyobn3.log Version: otopi-1.4.1 (otopi-1.4.1-1.el7ev) It has been detected that this program is executed through an SSH connection without using screen. Continuing with the installation may lead to broken installation if the network connection fails. It is highly recommended to abort the installation and run it inside a screen session using command "screen". Do you want to continue anyway? (Yes, No)[No]: yes [ INFO ] Hardware supports virtualization [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Generating libvirt-spice certificates [ INFO ] Stage: Environment customization --== STORAGE CONFIGURATION ==-- During customization use CTRL-D to abort. Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: Please specify the full shared storage connection path to use (example: host:/path): 10.35.64.11:/vol/RHEV/Virt/nsednev_3_6_HE_2 The specified storage location already contains a data domain. Is this an additional host setup (Yes, No)[Yes]? [ INFO ] Installing on additional host Please specify the Host ID [Must be integer, default: 2]: --== SYSTEM CONFIGURATION ==-- [WARNING] A configuration file must be supplied to deploy Hosted Engine on an additional host. [ INFO ] Answer file successfully loaded --== NETWORK CONFIGURATION ==-- [ INFO ] Additional host deployment, firewall manager is 'iptables' The following CPU types are supported by this host: - model_SandyBridge: Intel SandyBridge Family - model_Westmere: Intel Westmere Family - model_Nehalem: Intel Nehalem Family - model_Penryn: Intel Penryn Family - model_Conroe: Intel Conroe Family --== HOSTED ENGINE CONFIGURATION ==-- Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_2]: Enter 'admin@internal' user password that will be used for accessing the Administrator Portal: Confirm 'admin@internal' user password: [ INFO ] Stage: Setup validation [WARNING] Cannot validate host name settings, reason: resolved host does not match any of the local addresses Please provide the address of this host. Note: The engine VM and all the other hosts should be able to correctly resolve it. Host address: [localhost.localdomain]: [WARNING] Failed to resolve localhost.localdomain using DNS, it can be resolved only locally [ ERROR ] Host name is not valid: localhost.localdomain resolves to 127.0.0.1 and not all of them can be mapped to non loopback devices on this host Please provide the address of this host. Note: The engine VM and all the other hosts should be able to correctly resolve it. Host address: [localhost.localdomain]: 10.35.117.24 [ ERROR ] Host name is not valid: 10.35.117.24 is an IP address and not a FQDN. A FQDN is needed to be able to generate certificates correctly. Please provide the address of this host. Note: The engine VM and all the other hosts should be able to correctly resolve it. Host address: [localhost.localdomain]: alma03.qa.lab.tlv.redhat.com --== CONFIGURATION PREVIEW ==-- Engine FQDN : nsednev-he-2.qa.lab.tlv.redhat.com Bridge name : ovirtmgmt Host address : alma03.qa.lab.tlv.redhat.com SSH daemon port : 22 Firewall manager : iptables Gateway address : 10.35.117.254 Host name for web application : hosted_engine_2 Storage Domain type : nfs3 Host ID : 2 Image size GB : 50 GlusterFS Share Name : hosted_engine_glusterfs GlusterFS Brick Provisioning : False Storage connection : 10.35.64.11:/vol/RHEV/Virt/nsednev_3_6_HE_2 Console type : vnc Memory size MB : 4096 MAC address : 00:16:3E:7B:BB:BB Boot type : disk Number of CPUs : 4 Restart engine VM after engine-setup: True CPU Type : model_SandyBridge [ INFO ] Stage: Transaction setup [ INFO ] Stage: Misc configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Configuring libvirt [ INFO ] Configuring VDSM [ INFO ] Starting vdsmd [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Configuring VM [ INFO ] Updating hosted-engine configuration [ INFO ] Stage: Transaction commit [ INFO ] Stage: Closing up [ INFO ] Acquiring internal CA cert from the engine [ INFO ] The following CA certificate is going to be used, please immediately interrupt if not correct: [ INFO ] Issuer: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-2.qa.lab.tlv.redhat.com.25977, Subject: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-2.qa.lab.tlv.redhat.com.25977, Fingerprint (SHA-1): 2EA33E00CF9BCA3774DA08D708110F570F655192 [ INFO ] Connecting to the Engine [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational [ INFO ] Enabling and starting HA services [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160530131113.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ INFO ] Hosted Engine successfully set up [root@alma03 ~]# In case that /etc/hostname equal to resolvable and proper FQDN, the addition succeeds.
*** Bug 1347663 has been marked as a duplicate of this bug. ***