Description of problem: When RHEV-H is being added to hosted engine cluster by hosted-engine --deploy the deployment fails and the host is not registered to the manager Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.3.3.4-1.el7ev.noarch How reproducible: 100 Steps to Reproduce: 1.Install fresh RHEV-H 2.set host name in TUI 3.Start deployment of th hosted engine Actual results: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. SSH authentication failed, verify authentication parameters are correct (Username/Password, public-key etc.) You may refer to the engine.log file for further details. engine log shows: 2016-03-16 12:30:10,979 ERROR [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (ajp-/127.0.0.1:8702-4) [6d8ef2c] Failed to authenticate session with host 'xxxxxxxx': SSH authentication to 'root' failed. Please verify provided credentials. Make sure key is authorized at host Expected results: The host is successfully registered Additional info: This is caused by the fact that hosted engine deployment tool sends the FQDN of the hypervisor as localhost.localdomain
Here are the related logs from the investigation: 2016-03-16 16:45:15 DEBUG otopi.plugins.otopi.network.hostname hostname._validation:76 my name: rhev-h01.example.com ... 2016-03-16 16:45:15 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.network.bridge.Plugin._get_hostname_additional_hosts 2016-03-16 16:45:15 DEBUG otopi.context context.dumpEnvironment:500 ENVIRONMENT DUMP - BEGIN 2016-03-16 16:45:15 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_NETWORK/host_name=str:'localhost.localdomain' 2016-03-16 16:45:15 DEBUG otopi.context context.dumpEnvironment:514 ENVIRONMENT DUMP - END It means that _get_hostname_additional_hosts returned localhost.localdomain. It is a simple method: def _get_hostname_additional_hosts(self): self.environment[ ohostedcons.NetworkEnv.HOST_NAME ] = socket.getfqdn() It means that socket.getfqdn() returns localhost.localdomain. The reason behind is /etc/hosts: 127.0.0.1 localhost.localdomain localhost localhost.localdomain rhev-h01.example.com ::1 localhost6.localdomain6 localhost6 This line is added by the RHEV-H TUI when setting the hostname. socket.getfqdn then finds this alias and takes the FQDN (localhost.localdomain) from the hosts file.
Hello Roman, Can you please make sure your host name is DNS resolvable? and retest? The engine and it's hosts must be DNS resolvable. When you provide a hostname, the engine need to be able to reach the host, which require DNS. For hosted engine the best practice would be to reinstall cleanly, so remove any old installation and start over.
Nikolai, I have a same problem when has added second node RHEV-H 3.6. All nodes resolvable and additionally I add all nodes in all /etc/hosts files. Do nothing. Then I saw tcpdump traffic between RHEV-M, RHEV-H, DNS servers. RHEV-M: 10.77.157.11, name is cloud DNS: 10.77.157.12, name is dns RHEV-H: 10.77.157.234, name is aries-03 RHEV-M wants resolve RHEV-H as localhost.localdomain and wants to established ssh to it: 10:56:28.973215 IP 10.77.157.11.35711 > 10.77.157.12.53: 39710+ A? localhost.localdomain. (39) 10:56:28.973233 IP 10.77.157.11.35711 > 10.77.157.12.53: 27460+ AAAA? localhost.localdomain. (39) 10:56:28.973939 IP 10.77.157.12.53 > 10.77.157.11.35711: 39710 Refused- 0/0/0 (39) 10:56:28.973954 IP 10.77.157.12.53 > 10.77.157.11.35711: 27460 Refused- 0/0/0 (39) 10:56:28.974035 IP 10.77.157.11.35711 > 10.77.157.12.53: 39710+ A? localhost.localdomain. (39) 10:56:28.974054 IP 10.77.157.11.35711 > 10.77.157.12.53: 27460+ AAAA? localhost.localdomain. (39) 10:56:28.974635 IP 10.77.157.12.53 > 10.77.157.11.35711: 39710 Refused- 0/0/0 (39) 10:56:28.974646 IP 10.77.157.12.53 > 10.77.157.11.35711: 27460 Refused- 0/0/0 (39) When add string on RHEV-M /etc/hosts: 10.77.157.234 aries-03 localhost.localdomain Installation was completed... Why does RHEV-M tried resolve localhost.localdomain?
(In reply to Artem Aronchikov from comment #11) > Nikolai, I have a same problem when has added second node RHEV-H 3.6. > All nodes resolvable and additionally I add all nodes in all /etc/hosts > files. > Do nothing. > Then I saw tcpdump traffic between RHEV-M, RHEV-H, DNS servers. > RHEV-M: 10.77.157.11, name is cloud > DNS: 10.77.157.12, name is dns > RHEV-H: 10.77.157.234, name is aries-03 > > RHEV-M wants resolve RHEV-H as localhost.localdomain and wants to > established ssh to it: > > > 10:56:28.973215 IP 10.77.157.11.35711 > 10.77.157.12.53: 39710+ A? > localhost.localdomain. (39) > 10:56:28.973233 IP 10.77.157.11.35711 > 10.77.157.12.53: 27460+ AAAA? > localhost.localdomain. (39) > 10:56:28.973939 IP 10.77.157.12.53 > 10.77.157.11.35711: 39710 Refused- > 0/0/0 (39) > 10:56:28.973954 IP 10.77.157.12.53 > 10.77.157.11.35711: 27460 Refused- > 0/0/0 (39) > 10:56:28.974035 IP 10.77.157.11.35711 > 10.77.157.12.53: 39710+ A? > localhost.localdomain. (39) > 10:56:28.974054 IP 10.77.157.11.35711 > 10.77.157.12.53: 27460+ AAAA? > localhost.localdomain. (39) > 10:56:28.974635 IP 10.77.157.12.53 > 10.77.157.11.35711: 39710 Refused- > 0/0/0 (39) > 10:56:28.974646 IP 10.77.157.12.53 > 10.77.157.11.35711: 27460 Refused- > 0/0/0 (39) > > When add string on RHEV-M /etc/hosts: > 10.77.157.234 aries-03 localhost.localdomain > Installation was completed... > > Why does RHEV-M tried resolve localhost.localdomain? It's an ordering problem in /etc/hosts. The RHEV-H codebase used augeas to set an additional alias on 127.0.0.1, but didn't actually change the hostname which was resolved (the patch fixes this). Ordering /etc/hosts as you did ($address $hostname $[aliases...]) is a valid workaround, though
This bug got closed, because a code-level fix for this issue will not be available in RHEV 4.0. But this issue will be fixed from a functional perspective, because the fix to this issue is covered by the migration to Cockpit for administration.
Re-opening because this bug got closed accidentally.
My bad, there was already a 3.6.z fix.