Bug 1278904

Summary: credential RHEV hosts fail
Product: Red Hat CloudForms Management Engine Reporter: Sergio Ocón-Cárdenas <soconcar>
Component: ProvidersAssignee: Greg Blomquist <gblomqui>
Status: CLOSED ERRATA QA Contact: Nandini Chandra <nachandr>
Severity: high Docs Contact:
Priority: high    
Version: 5.5.0CC: cpelland, gblomqui, jfrey, jhardy, jkim, jmatthew, jmontleo, jprause, mcornea, mfeifer, nachandr, ncarboni, obarenbo, simaishi, tcarlin, tmoor
Target Milestone: GA   
Target Release: 5.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.5.0.12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-08 13:45:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1282895, 1291858    
Attachments:
Description Flags
automation.log
none
production.log
none
Example of Appliance Hostname Error in the UI none

Description Sergio Ocón-Cárdenas 2015-11-06 17:19:51 UTC
Created attachment 1090742 [details]
automation.log

Description of problem:
On a new Beta2 appliance, I have added RHEV as a provider. When adding the ssh information to the hosts it fails

Version-Release number of selected component (if applicable):
5.5.0.9-beta2.20151102161742_5530c9a 

How reproducible:
Always

Steps to Reproduce:
1. Add a new RHEVM
2. Refresh it and go to hosts
3. Try to add credentials to the hosts

Actual results:
Unexpected response returned from system, see log for details

Expected results:
Validation

Additional info:
/var/log/secure in host is not showing connection

Comment 2 Sergio Ocón-Cárdenas 2015-11-06 17:21:13 UTC
Created attachment 1090743 [details]
production.log

Comment 3 Dave Johnson 2015-11-14 18:35:23 UTC
reproduced this, it failing to connect...  forward and reverse DNS does work from the command line as well as credentialing vmware hosts


snippet from ev,log
[----] E, [2015-11-14T13:21:50.704088 #2922:1185988] ERROR -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) SSH connection failed for [<ip_address>] with [SocketError: getaddrinfo: Name or service not known]
[----] W, [2015-11-14T13:21:50.704467 #2922:1185988]  WARN -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#verify_credentials_with_ssh) #<SocketError: getaddrinfo: Name or service not known>
[----] E, [2015-11-14T13:21:50.704568 #2922:1185988] ERROR -- : MIQ(host_controller-update): Unexpected response returned from system, see log for details

Comment 5 Greg Blomquist 2015-11-16 22:38:08 UTC
So, this looks like it's happening because the *appliance* has an invalid hostname.  When trying to ssh to the hosts, the code first attempts to validate the appliance's fully qualified domain name:

https://gist.github.com/blomquisg/b88c3ac018fc00f14a34

Comment 6 tim.moor 2015-11-16 22:58:33 UTC
Greg, that does indeed appear to be the issue, any idea wither this added validation has been added in 4.0, or wither this existed in earlier versions?

[SOLUTION:] Provide the Cloudforms appliance with a valid hostname that matches its resolvable fqdn.

Comment 7 tim.moor 2015-11-16 23:26:49 UTC
Can we update the error message to be something a little bit more meaningful.

Comment 8 Greg Blomquist 2015-11-16 23:54:54 UTC
Hi Tim,

Yeah, I've been playing with one of the QE appliances a little to attempt to improve the logging.  Here's what I've got so far:

> [----] I, [2015-11-16T18:48:39.813591 #45741:3d9988]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#verify_credentials_with_ssh) Verifying Host SSH credentials for [ibm-x3250m4-05]
> [----] I, [2015-11-16T18:48:39.820598 #45741:3d9988]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) Initiating SSH connection to Host:[ibm-x3250m4-05] using [ibm-x3250m4-05.REDACTED.com] for user:[test].  Options:[{:remember_host=>false}]
> [----] I, [2015-11-16T18:48:39.820742 #45741:3d9988]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) SSH connection established to [ibm-x3250m4-05.REDACTED.com]
> [----] E, [2015-11-16T18:48:39.980280 #45741:3d9988] ERROR -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#connect_ssh) SSH connection failed for [ibm-x3250m4-05.REDACTED.com] with [SocketError: Unable to get fully qualified domain name for appliance localhost.localdomain.localdomain, error: getaddrinfo: Name or service not known]
> [----] W, [2015-11-16T18:48:39.980563 #45741:3d9988]  WARN -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Host#verify_credentials_with_ssh) #<SocketError: Unable to get fully qualified domain name for appliance localhost.localdomain.localdomain, error: getaddrinfo: Name or service not known>
> [----] E, [2015-11-16T18:48:39.980941 #45741:3d9988] ERROR -- : MIQ(host_controller-update): Unexpected response returned from system, see log for details

That shows from the beginning of the validation process down through the error that ends up getting presented to the user in the UI.

If I can connect the dots, I might even be able to get the updated SocketError message (Unable to get fully qualified domain name for appliance localhost.localdomain.localdomain) percolated up to the UI.  I'll see what I can do about that.

Comment 9 Greg Blomquist 2015-11-17 03:19:36 UTC
Created attachment 1095199 [details]
Example of Appliance Hostname Error in the UI

I've attached an example of what the Appliance Hostname error would look like in the UI.  So far, this is all with just testing on an QE appliance (that exhibits the same lack of DNS-resolvable hostname).

If this looks right, I can put together a pull request with these changes.

Comment 10 Greg Blomquist 2015-11-17 03:26:25 UTC
Here's the branch tracking the changes shown in the screenshot from comment #9:

https://github.com/blomquisg/manageiq/tree/bz1278904-invalid-appliance-hostname

Comment 11 John Matthews 2015-11-17 19:39:34 UTC
From our testing we ran into this same BZ.
We saw that /etc/hostname on cfme-rhevm-5.5.0.9-2.x86_64.rhevm.ova had a bad entry:

# cat /etc/hostname
localhost.localdomain.localdomain


opposed to 
 localhost.localdomain


Once I updated the hostname to "localhost.localdomain" the issue was resolved:

From "rails console":

 MiqSockUtil.getFullyQualifiedDomainName
=> "localhost"

Added credentials for the RHEV Hypervisor and it worked.

Comment 12 John Matthews 2015-11-17 19:52:08 UTC
Filed Bug 1282927 to track the change for /etc/hostname so it is "localhost.localdomain" by default and SSH functionality will then work.

Comment 13 Greg Blomquist 2015-11-18 20:41:51 UTC
https://github.com/ManageIQ/manageiq/pull/5502

Comment 14 Nick Carboni 2015-11-19 13:47:41 UTC
Would this also be solved if the hostname in /etc/hostname was resolvable to 127.0.0.1? (i.e. if we added whatever was in /etc/hostname to the /etc/hosts file)

Comment 16 CFME Bot 2015-11-19 19:42:59 UTC
New commit detected on ManageIQ/manageiq-appliance/master:
https://github.com/ManageIQ/manageiq-appliance/commit/1926c54093577c1c0542eea14dd80b086b9438ce

commit 1926c54093577c1c0542eea14dd80b086b9438ce
Author:     Nick Carboni <ncarboni>
AuthorDate: Tue Nov 17 16:02:44 2015 -0500
Commit:     Nick Carboni <ncarboni>
CommitDate: Thu Nov 19 13:50:01 2015 -0500

    Remove cloud-init's ability to change the appliance hostname
    
    The altered hostname was not included in /etc/hosts
    causing us to not be able to resolve it when attempting
    to run `MiqSockUtil.getFullyQualifiedDomainName`
    
    The decision was made to disallow cloud-init from
    changing the hostname at all as to not conflict with our
    existing methods of changing the hostname using the
    appliance_console.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1282927
    https://bugzilla.redhat.com/show_bug.cgi?id=1278904

 COPY/etc/cloud/cloud.cfg.d/miq_cloud.cfg | 2 ++
 1 file changed, 2 insertions(+)

Comment 17 Nick Carboni 2015-11-19 19:44:34 UTC
The decision was made to change the cloud-init config to not touch the hostname files. This should leave the hostname as localhost.localdomain on new appliances.

Comment 18 Greg Blomquist 2015-11-19 23:21:55 UTC
*** Bug 1221707 has been marked as a duplicate of this bug. ***

Comment 19 CFME Bot 2015-11-23 17:26:45 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/5e18768df48ad7dc8985a861ba0e16197b50202c

commit 5e18768df48ad7dc8985a861ba0e16197b50202c
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Nov 16 22:18:39 2015 -0500
Commit:     Greg Blomquist <gblomqui>
CommitDate: Thu Nov 19 17:59:15 2015 -0500

    No longer validate source hostname with MiqSshUtil
    
    Way back in e6dcb57e41a5b9a2326dbadecd995808f9e043d3, code was added to make
    sure that the appliance had a valid hostname before attempting to SSH from the
    appliance to another box.  The reasoning at the time was that the SSH gem would
    ignore a failure caused by having an invalid appliance hostname, but then later
    blow up because of the invalid appliance hostname.
    
    It does not appears that SSH even cares about the appliance's hostname anymore
    when establishing an SSH connection to another server (in fact, it was
    surprising that it even would care).
    
    With this check removed, validating a host's SSH credentials will no longer
    throw a misleading "getaddrinfo" error when the appliance has a bad hostname.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1278904

 gems/pending/util/MiqSshUtilV2.rb | 4 ----
 1 file changed, 4 deletions(-)

Comment 20 CFME Bot 2015-11-23 21:12:11 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=1fe4c792c748030a405c0ab02d98e491f3056152

commit 1fe4c792c748030a405c0ab02d98e491f3056152
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Nov 16 22:18:39 2015 -0500
Commit:     Greg Blomquist <gblomqui>
CommitDate: Mon Nov 23 12:24:34 2015 -0500

    No longer validate source hostname with MiqSshUtil
    
    Way back in e6dcb57e41a5b9a2326dbadecd995808f9e043d3, code was added to make
    sure that the appliance had a valid hostname before attempting to SSH from the
    appliance to another box.  The reasoning at the time was that the SSH gem would
    ignore a failure caused by having an invalid appliance hostname, but then later
    blow up because of the invalid appliance hostname.
    
    It does not appears that SSH even cares about the appliance's hostname anymore
    when establishing an SSH connection to another server (in fact, it was
    surprising that it even would care).
    
    With this check removed, validating a host's SSH credentials will no longer
    throw a misleading "getaddrinfo" error when the appliance has a bad hostname.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1278904

 gems/pending/util/MiqSshUtilV2.rb | 4 ----
 1 file changed, 4 deletions(-)

Comment 21 CFME Bot 2015-11-23 21:12:30 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=3fb9303616a99ac2217f035ee22c5d0bbfa1c86f

commit 3fb9303616a99ac2217f035ee22c5d0bbfa1c86f
Merge: e3a6ea0 1fe4c79
Author:     Jason Frey <jfrey>
AuthorDate: Mon Nov 23 15:49:06 2015 -0500
Commit:     Jason Frey <jfrey>
CommitDate: Mon Nov 23 15:49:06 2015 -0500

    Merge branch 'bz1283195-5.5.z-backport' into '5.5.z'
    
    No longer validate source hostname with MiqSshUtil
    
    Clean cherry pick from upstream PR: https://github.com/ManageIQ/manageiq/pull/5502
    
    Way back in e6dcb57e41a5b9a2326dbadecd995808f9e043d3, code was added to make
    sure that the appliance had a valid hostname before attempting to SSH from the
    appliance to another box.  The reasoning at the time was that the SSH gem would
    ignore a failure caused by having an invalid appliance hostname, but then later
    blow up because of the invalid appliance hostname.
    
    It does not appears that SSH even cares about the appliance's hostname anymore
    when establishing an SSH connection to another server (in fact, it was
    surprising that it even would care).
    
    With this check removed, validating a host's SSH credentials will no longer
    throw a misleading "getaddrinfo" error when the appliance has a bad hostname.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1278904
    
    See merge request !523

 gems/pending/util/MiqSshUtilV2.rb | 4 ----
 1 file changed, 4 deletions(-)

Comment 22 Nandini Chandra 2015-11-25 04:19:05 UTC
Verified in 5.5.0.12

Comment 24 errata-xmlrpc 2015-12-08 13:45:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2551

Comment 25 Greg Blomquist 2016-02-19 23:11:09 UTC
*** Bug 1291858 has been marked as a duplicate of this bug. ***

Comment 26 Greg Blomquist 2016-02-19 23:11:12 UTC
*** Bug 1245171 has been marked as a duplicate of this bug. ***