Bug 1221707 - SmartState analysis fails for Openstack Infrastructure Provider nodes
Summary: SmartState analysis fails for Openstack Infrastructure Provider nodes
Keywords:
Status: CLOSED DUPLICATE of bug 1278904
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.4.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: GA
: 5.5.0
Assignee: Ladislav Smola
QA Contact: Marius Cornea
URL:
Whiteboard: openstack
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-14 15:34 UTC by Marius Cornea
Modified: 2016-02-08 14:05 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 23:21:55 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
detailed error log (6.18 KB, text/x-vhdl)
2015-05-14 15:34 UTC, Marius Cornea
no flags Details

Description Marius Cornea 2015-05-14 15:34:56 UTC
Created attachment 1025484 [details]
detailed error log

Description of problem:
Host fleecing fails for nodes discovered in the Openstack Infrastructure provider 

Version-Release number of selected component (if applicable):
5.4.0.1.20150512111354_4368716

How reproducible:
Fresh provisioned appliance with Openstack Infra provider attached.  

Steps to Reproduce:
1. Add a new Openstack Infra provider.
2. Wait for the nodes to be discovered.
3. Trigger SmartState analysis for the discovered nodes.

Actual results:
SmartState analysis fails. 

Expected results:
SmartState analysis runs and CFME gets nodes info via SSH.

Additional info:

Logs show the following:
ERROR -- : host.connect_ssh: SSH connection failed for [192.0.2.11] with [SocketError: getaddrinfo: Name or service not known]
ERROR -- : [SocketError]: getaddrinfo: Name or service not known  Method:[rescue in block in scan_from_queue]
ERROR -- : /var/www/miq/lib/util/MiqSockUtil.rb:11:in `gethostbyname'

CFME instance can access hosts by IP but it appears that it's trying to do some name resolution. The nodes show the IP address in both Hostname and IP Address fields.

Comment 2 Ladislav Smola 2015-05-15 10:13:24 UTC
full log

http://paste.openstack.org/show/223658/

Comment 3 Marius Cornea 2015-05-15 10:36:06 UTC
The problem was caused by the CFME appliance not having a valid fqdn. It got solved after running:

[root@localhost ~]# hostname -f
hostname: Unknown host
[root@localhost ~]# hostname localhost.localdomain
[root@localhost ~]# hostname -f
localhost

Comment 4 Dave Johnson 2015-05-20 19:42:51 UTC
Marius, sounds like we should close this one then as not a bug?

Comment 5 Marius Cornea 2015-05-20 20:11:40 UTC
Yes, it's not a bug but the log error message is misleading because it points to remote hosts whilst it's actually generated by the local host. Not sure if we should close it or mark as low priority.

Comment 6 Ladislav Smola 2015-06-17 07:25:08 UTC
I am putting it to low. Marius, where should the fix go? Into installing doc of CFME, or some CFME installer? 

Or we should not require fqdn?

Comment 7 Marius Cornea 2015-06-17 07:38:38 UTC
I think that SSH should work without having a valid fqdn set, I don't know why it's not possible here (see [1]). Would it be possible to log an error when this is hit that clearly specifies you don't have valid fqdn set on the cfme machine ? 

[1] https://github.com/ManageIQ/manageiq/blob/2a6ac9973eab0ad759c8382a013626fd775b8f06/lib/util/MiqSshUtilV2.rb#L269

Comment 8 Ladislav Smola 2015-06-17 07:44:21 UTC
Right. @Greg can you figure out if the fqdn check is still needed, the comment about unclear error is unclear. :-) Possibly just some old bug in Net:SSH ?

Comment 9 Greg Blomquist 2015-07-20 02:00:51 UTC
I'm moving this over to the appliance component.  Basically, we need to make sure that the appliance has *some* hostname set.

If it's already guaranteed, then this bug could just be closed, I guess.

Comment 10 Ladislav Smola 2015-07-28 12:59:11 UTC
@Marius could you check if FQDN set will be documented and required setting in the installer? That should be enough to close this one. Otherwise we would need to investigate the errors with missing fqdn as mentioned in comment #7

Comment 11 Ramesh A 2015-10-29 14:45:52 UTC
Marius / Ladislav,

From QE prospective, can you please let us know what needs to be tested to close this issue?

As far as the error is concerned I am still able to reproduce it.  Not sure if there is any updates on the documentation.  If you can share the updates, we can go ahead with further action.

Thanks,
Ramesh

Comment 12 Marius Cornea 2015-10-30 01:45:22 UTC
I've just tried deploying an appliance on an Openstack environment and I got this issue. I believe this is the same when deploying it on other infrastructure types(RHEV or VMware). 

[root@host-192-168-0-101 ~]# hostname -f
hostname: Unknown host
[root@host-192-168-0-101 ~]# hostname 
host-192-168-0-101

This bugs remains valid in my opinion and should not be ON_QA as no patches were done. The error message is misleading because it refers to not being able to resolve the local machine hostname in the context of sshing to a remote host. If there is a need to do name resolution for the localhost while sshing to a remote node then the error message should explicitly mention why it is failing. 

I'm going to file an additional docs BZ for this.

Comment 13 Marius Cornea 2015-10-30 01:51:14 UTC
Filed docs bug BZ#1276521.

Comment 14 Ramesh A 2015-10-30 06:40:29 UTC
Hi Ladislav,

As per the comment#12 from Marius, I am moving this back to ON_DEV.

Thanks,
Ramesh

Comment 16 Greg Blomquist 2015-11-19 23:21:55 UTC

*** This bug has been marked as a duplicate of bug 1278904 ***

Comment 17 Ladislav Smola 2015-11-20 07:39:53 UTC
@Greg so can I drop the fix https://github.com/ManageIQ/manageiq/pull/5403, the hostname will be always set?

Comment 18 Greg Blomquist 2016-02-08 14:05:10 UTC
@Ladas, yes, you can drop that fix (looks like it's closed already).

https://github.com/ManageIQ/manageiq/pull/5502 fixed this by just not checking for a fully qualified domain name of the appliance before SSHing out.  The check is no longer necessary with current versions of SSH.


Note You need to log in before you can comment on or make changes to this bug.