Bug 1401005

Summary: RHOSP:Register Nodes screen: Cannot register node due to ironic connection error to hypervisor
Product: Red Hat Quickstart Cloud Installer Reporter: James Olin Oden <joden>
Component: WebUIAssignee: John Matthews <jmatthew>
Status: CLOSED ERRATA QA Contact: James Olin Oden <joden>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.1CC: jesusr, jmontleo, llasmith, qci-bugzillas, tpapaioa
Target Milestone: ---   
Target Release: 1.1   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-28 01:41:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description James Olin Oden 2016-12-02 14:42:53 UTC
Description of problem:
When trying to register a node for an OSP deployment I get the following error:

    The following nodes have errors:

        MAC Address 52:54:00:1e:ca:de from 10.8.196.6 Failed to validate power driver interface for node 3bfc4eaf-3410-40ab-83e7-fb32e93c4288. Error: SSH connection cannot be established: Failed to establish SSH connection to host 10.8.196.6.

However if I can ssh from the satellite to the the Hypervisor, and I can ssh from the director to the hypervisor.

The log had the following error in it over and over:

   2016-12-01 14:54:29 [app] [E] Fog::Compute::OpenStack::NotFound:   Expected([200, 204]) <=> Actual(404 Not Found)
    | excon.error.response
    |   :body          => "{\"error_message\": \"{\\\"debuginfo\\\": null, \\\"faultcode\\\": \\\"Client\\\", \\\"faultstring\\\": \\\"Node fd0a62dd-f9dc-4d89-bab9-88b06308db99 could not be found.\\\"}\"}"
    |   :headers       => {
    |     "Content-Length"                         => "153"
    |     "Content-Type"                           => "application/json"
    |     "Date"                                   => "Thu, 01 Dec 2016 19:54:29 GMT"
    |     "Openstack-Request-Id"                   => "req-125dbd54-719d-4cae-b15b-042a3407d3d4"
    |     "X-Openstack-Ironic-Api-Maximum-Version" => "1.22"
    |     "X-Openstack-Ironic-Api-Minimum-Version" => "1.1"
    |     "X-Openstack-Ironic-Api-Version"         => "1.1"
    |   }
    |   :local_address => "192.168.175.10"
    |   :local_port    => 45878
    |   :reason_phrase => "Not Found"
    |   :remote_ip     => "192.0.7.254"
    |   :status        => 404
    |   :status_line   => "HTTP/1.1 404 Not Found\r\n"

If you look on the OSP director you and try to set the node power state you will see the following erorr:

   [stack@localhost ~]$ ironic node-set-power-state 8517c975-6c49-488f-b8b2-1006cfbfb3fb off
   SSH connection cannot be established: Failed to establish SSH connection to host 10.8.196.6. (HTTP 400)

However if you try to ssh to that host, it works fine:

   [stack@localhost ~]$ ssh root.196.6
   root.196.6's password: 
   Last login: Fri Dec  2 09:20:36 2016 from 10.13.57.109

On the Hypervisor trying to be ssh'd to by ironic you will see the an entry like this in the audit.log:

   type=USER_AUTH msg=audit(1480624970.048:21033564): pid=10834 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=pubkey acct="root" exe="/usr/sbin/sshd" hostname=? addr=192.0.7.254 terminal=ssh res=failed'

Version-Release number of selected component (if applicable):
QCI-1.1-RHEL-7-20161201.t.1 

How reproducible:
Every time

Steps to Reproduce:
1.  Try to create an OSP deployment.
2.  Try to register a node.

Actual results:
Fails as above with the ssh connection error.

Expected results:
No failure.

Additional info:

Comment 2 Landon LaSmith 2016-12-02 15:36:10 UTC
I've encountered this same issue on 2 additional systems running OCI/QCIOOO on nested virtualization

QCI Media Version: QCI-1.1-RHEL-7-20161201.t.1 
QCIOOO Media Version: QCIOOO-10.0-RHEL-7-20161130.t.1

Comment 3 Landon LaSmith 2016-12-02 18:42:38 UTC
Temporary workaround is using the password for authentication on the undercloud instead of the ssh_key_contents and manually registering

Steps to remove key contents and use the password:
ironic node-update <uuid> remove driver_info/ssh_key_contents
ironic node-update <uuid> add driver_info/ssh_password='<ssh password>'

Comment 4 Landon LaSmith 2016-12-02 18:52:57 UTC
After conversation with Jason Montleon, he discovered it's an selinux issue.  The true temporary workaround until a new compose is running 'setenforce 0' on the satellite server prior to node registration.

Comment 5 Jason Montleon 2016-12-02 19:01:35 UTC
*** Bug 1401007 has been marked as a duplicate of this bug. ***

Comment 6 Jason Montleon 2016-12-02 19:03:02 UTC
setenforce 0 will work as a temporary workaround. I'll work on fixing the selinux denial.

Comment 7 James Olin Oden 2016-12-05 15:01:21 UTC
Verified against:  QCI-1.1-RHEL-7-20161202.t.1

Comment 10 errata-xmlrpc 2017-02-28 01:41:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335