Bug 1278904
| Summary: | credential RHEV hosts fail | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Sergio Ocón-Cárdenas <soconcar> | ||||||||
| Component: | Providers | Assignee: | Greg Blomquist <gblomqui> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Nandini Chandra <nachandr> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 5.5.0 | CC: | cpelland, gblomqui, jfrey, jhardy, jkim, jmatthew, jmontleo, jprause, mcornea, mfeifer, nachandr, ncarboni, obarenbo, simaishi, tcarlin, tmoor | ||||||||
| Target Milestone: | GA | ||||||||||
| Target Release: | 5.5.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | 5.5.0.12 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2015-12-08 13:45:16 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1282895, 1291858 | ||||||||||
| Attachments: |
|
||||||||||
Created attachment 1090743 [details]
production.log
reproduced this, it failing to connect... forward and reverse DNS does work from the command line as well as credentialing vmware hosts snippet from ev,log [----] E, [2015-11-14T13:21:50.704088 #2922:1185988] ERROR -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) SSH connection failed for [<ip_address>] with [SocketError: getaddrinfo: Name or service not known] [----] W, [2015-11-14T13:21:50.704467 #2922:1185988] WARN -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#verify_credentials_with_ssh) #<SocketError: getaddrinfo: Name or service not known> [----] E, [2015-11-14T13:21:50.704568 #2922:1185988] ERROR -- : MIQ(host_controller-update): Unexpected response returned from system, see log for details So, this looks like it's happening because the *appliance* has an invalid hostname. When trying to ssh to the hosts, the code first attempts to validate the appliance's fully qualified domain name: https://gist.github.com/blomquisg/b88c3ac018fc00f14a34 Greg, that does indeed appear to be the issue, any idea wither this added validation has been added in 4.0, or wither this existed in earlier versions? [SOLUTION:] Provide the Cloudforms appliance with a valid hostname that matches its resolvable fqdn. Can we update the error message to be something a little bit more meaningful. Hi Tim,
Yeah, I've been playing with one of the QE appliances a little to attempt to improve the logging. Here's what I've got so far:
> [----] I, [2015-11-16T18:48:39.813591 #45741:3d9988] INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#verify_credentials_with_ssh) Verifying Host SSH credentials for [ibm-x3250m4-05]
> [----] I, [2015-11-16T18:48:39.820598 #45741:3d9988] INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) Initiating SSH connection to Host:[ibm-x3250m4-05] using [ibm-x3250m4-05.REDACTED.com] for user:[test]. Options:[{:remember_host=>false}]
> [----] I, [2015-11-16T18:48:39.820742 #45741:3d9988] INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) SSH connection established to [ibm-x3250m4-05.REDACTED.com]
> [----] E, [2015-11-16T18:48:39.980280 #45741:3d9988] ERROR -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) SSH connection failed for [ibm-x3250m4-05.REDACTED.com] with [SocketError: Unable to get fully qualified domain name for appliance localhost.localdomain.localdomain, error: getaddrinfo: Name or service not known]
> [----] W, [2015-11-16T18:48:39.980563 #45741:3d9988] WARN -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#verify_credentials_with_ssh) #<SocketError: Unable to get fully qualified domain name for appliance localhost.localdomain.localdomain, error: getaddrinfo: Name or service not known>
> [----] E, [2015-11-16T18:48:39.980941 #45741:3d9988] ERROR -- : MIQ(host_controller-update): Unexpected response returned from system, see log for details
That shows from the beginning of the validation process down through the error that ends up getting presented to the user in the UI.
If I can connect the dots, I might even be able to get the updated SocketError message (Unable to get fully qualified domain name for appliance localhost.localdomain.localdomain) percolated up to the UI. I'll see what I can do about that.
Created attachment 1095199 [details]
Example of Appliance Hostname Error in the UI
I've attached an example of what the Appliance Hostname error would look like in the UI. So far, this is all with just testing on an QE appliance (that exhibits the same lack of DNS-resolvable hostname).
If this looks right, I can put together a pull request with these changes.
Here's the branch tracking the changes shown in the screenshot from comment #9: https://github.com/blomquisg/manageiq/tree/bz1278904-invalid-appliance-hostname From our testing we ran into this same BZ. We saw that /etc/hostname on cfme-rhevm-5.5.0.9-2.x86_64.rhevm.ova had a bad entry: # cat /etc/hostname localhost.localdomain.localdomain opposed to localhost.localdomain Once I updated the hostname to "localhost.localdomain" the issue was resolved: From "rails console": MiqSockUtil.getFullyQualifiedDomainName => "localhost" Added credentials for the RHEV Hypervisor and it worked. Filed Bug 1282927 to track the change for /etc/hostname so it is "localhost.localdomain" by default and SSH functionality will then work. Would this also be solved if the hostname in /etc/hostname was resolvable to 127.0.0.1? (i.e. if we added whatever was in /etc/hostname to the /etc/hosts file) New commit detected on ManageIQ/manageiq-appliance/master: https://github.com/ManageIQ/manageiq-appliance/commit/1926c54093577c1c0542eea14dd80b086b9438ce commit 1926c54093577c1c0542eea14dd80b086b9438ce Author: Nick Carboni <ncarboni> AuthorDate: Tue Nov 17 16:02:44 2015 -0500 Commit: Nick Carboni <ncarboni> CommitDate: Thu Nov 19 13:50:01 2015 -0500 Remove cloud-init's ability to change the appliance hostname The altered hostname was not included in /etc/hosts causing us to not be able to resolve it when attempting to run `MiqSockUtil.getFullyQualifiedDomainName` The decision was made to disallow cloud-init from changing the hostname at all as to not conflict with our existing methods of changing the hostname using the appliance_console. https://bugzilla.redhat.com/show_bug.cgi?id=1282927 https://bugzilla.redhat.com/show_bug.cgi?id=1278904 COPY/etc/cloud/cloud.cfg.d/miq_cloud.cfg | 2 ++ 1 file changed, 2 insertions(+) The decision was made to change the cloud-init config to not touch the hostname files. This should leave the hostname as localhost.localdomain on new appliances. *** Bug 1221707 has been marked as a duplicate of this bug. *** New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/5e18768df48ad7dc8985a861ba0e16197b50202c commit 5e18768df48ad7dc8985a861ba0e16197b50202c Author: Greg Blomquist <gblomqui> AuthorDate: Mon Nov 16 22:18:39 2015 -0500 Commit: Greg Blomquist <gblomqui> CommitDate: Thu Nov 19 17:59:15 2015 -0500 No longer validate source hostname with MiqSshUtil Way back in e6dcb57e41a5b9a2326dbadecd995808f9e043d3, code was added to make sure that the appliance had a valid hostname before attempting to SSH from the appliance to another box. The reasoning at the time was that the SSH gem would ignore a failure caused by having an invalid appliance hostname, but then later blow up because of the invalid appliance hostname. It does not appears that SSH even cares about the appliance's hostname anymore when establishing an SSH connection to another server (in fact, it was surprising that it even would care). With this check removed, validating a host's SSH credentials will no longer throw a misleading "getaddrinfo" error when the appliance has a bad hostname. https://bugzilla.redhat.com/show_bug.cgi?id=1278904 gems/pending/util/MiqSshUtilV2.rb | 4 ---- 1 file changed, 4 deletions(-) New commit detected on cfme/5.5.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=1fe4c792c748030a405c0ab02d98e491f3056152 commit 1fe4c792c748030a405c0ab02d98e491f3056152 Author: Greg Blomquist <gblomqui> AuthorDate: Mon Nov 16 22:18:39 2015 -0500 Commit: Greg Blomquist <gblomqui> CommitDate: Mon Nov 23 12:24:34 2015 -0500 No longer validate source hostname with MiqSshUtil Way back in e6dcb57e41a5b9a2326dbadecd995808f9e043d3, code was added to make sure that the appliance had a valid hostname before attempting to SSH from the appliance to another box. The reasoning at the time was that the SSH gem would ignore a failure caused by having an invalid appliance hostname, but then later blow up because of the invalid appliance hostname. It does not appears that SSH even cares about the appliance's hostname anymore when establishing an SSH connection to another server (in fact, it was surprising that it even would care). With this check removed, validating a host's SSH credentials will no longer throw a misleading "getaddrinfo" error when the appliance has a bad hostname. https://bugzilla.redhat.com/show_bug.cgi?id=1278904 gems/pending/util/MiqSshUtilV2.rb | 4 ---- 1 file changed, 4 deletions(-) New commit detected on cfme/5.5.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=3fb9303616a99ac2217f035ee22c5d0bbfa1c86f commit 3fb9303616a99ac2217f035ee22c5d0bbfa1c86f Merge: e3a6ea0 1fe4c79 Author: Jason Frey <jfrey> AuthorDate: Mon Nov 23 15:49:06 2015 -0500 Commit: Jason Frey <jfrey> CommitDate: Mon Nov 23 15:49:06 2015 -0500 Merge branch 'bz1283195-5.5.z-backport' into '5.5.z' No longer validate source hostname with MiqSshUtil Clean cherry pick from upstream PR: https://github.com/ManageIQ/manageiq/pull/5502 Way back in e6dcb57e41a5b9a2326dbadecd995808f9e043d3, code was added to make sure that the appliance had a valid hostname before attempting to SSH from the appliance to another box. The reasoning at the time was that the SSH gem would ignore a failure caused by having an invalid appliance hostname, but then later blow up because of the invalid appliance hostname. It does not appears that SSH even cares about the appliance's hostname anymore when establishing an SSH connection to another server (in fact, it was surprising that it even would care). With this check removed, validating a host's SSH credentials will no longer throw a misleading "getaddrinfo" error when the appliance has a bad hostname. https://bugzilla.redhat.com/show_bug.cgi?id=1278904 See merge request !523 gems/pending/util/MiqSshUtilV2.rb | 4 ---- 1 file changed, 4 deletions(-) Verified in 5.5.0.12 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2551 *** Bug 1291858 has been marked as a duplicate of this bug. *** *** Bug 1245171 has been marked as a duplicate of this bug. *** |
Created attachment 1090742 [details] automation.log Description of problem: On a new Beta2 appliance, I have added RHEV as a provider. When adding the ssh information to the hosts it fails Version-Release number of selected component (if applicable): 5.5.0.9-beta2.20151102161742_5530c9a How reproducible: Always Steps to Reproduce: 1. Add a new RHEVM 2. Refresh it and go to hosts 3. Try to add credentials to the hosts Actual results: Unexpected response returned from system, see log for details Expected results: Validation Additional info: /var/log/secure in host is not showing connection