Bug 1290457 - Node registration fails with Introspection timeout
Node registration fails with Introspection timeout
Status: CLOSED ERRATA
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHELOSP (Show other bugs)
1.0
x86_64 All
urgent Severity urgent
: ga
: 1.0
Assigned To: John Matthews
Thom Carlin
Dan Macpherson
: Triaged
Depends On:
Blocks: rhci-sprint-16 qci-sprint-17
  Show dependency treegraph
 
Reported: 2015-12-10 10:08 EST by Antonin Pagac
Modified: 2016-09-13 12:23 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-13 12:23:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1862 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.0 2016-09-13 16:18:48 EDT

  None (edit)
Description Antonin Pagac 2015-12-10 10:08:54 EST
Description of problem:
While doing a deployment od RHELOSP, in step Register Nodes, after submitting valid data about nodes, there is an error:

  Error Introspection timeout

I have not seen this behaviour before on this HW. Attaching tar of /var/log from both Satellite and RHELOSP machines. I was deploying on bare metal machines and registering two nodes at the same time.

When I look at Director, I can see the nodes are there, but have no flavor.

Version-Release number of selected component (if applicable):
RHCI-6.0-RHEL-7-20151208.t.0
RHCIOOO-7-RHEL-7-20151208.t.0

How reproducible:
Happened to me once

Steps to Reproduce:
1. Start a deployment of RHELOSP, go to the Register Nodes, fill out valid required information
2. Wait. After quite long time (~hour?) an error appears
3.

Actual results:
Timeout while registering nodes

Expected results:
Node registration complete; able to go to the next step

Additional info:
This might be relevant: https://bugzilla.redhat.com/show_bug.cgi?id=1280263
Comment 3 Jason Montleon 2016-03-21 12:27:35 EDT
Have you seen this with the latest composes (in particular the latest OOO iso compose)? We found that the swift services sometimes fail to start during installation because files haven't been created yet, which can cause node introspection to time out. We've filed a bug against OSP and have added a line to restart the swift services at the end of the fusor-underlcoud-installer run.
Comment 7 Antonin Pagac 2016-03-22 07:07:53 EDT
Jason,

for TP3 RC1, the nodes registered without problems. I need to try couple of times more just to be on the safe side.
Comment 8 Dan Yocum 2016-03-23 17:39:05 EDT
I just had a single node introspection fail, leaving the node in powered-on state, so I'm inclined to think the "booting-before-pxe-config has completed" has some merit.
Comment 9 Dan Yocum 2016-03-23 22:37:54 EDT
Introspection worked with this Dell R630 hardware with 7.1, now it only succeeds intermittently.  Here's a new error:


| last_error             | Failed to change power state to 'power off'. Error: not all arguments  |
|                        | converted during string formatting                                     |

Bumping to urgent.
Comment 10 Dan Yocum 2016-04-21 10:55:31 EDT
(In reply to Dan Yocum from comment #9)
> Introspection worked with this Dell R630 hardware with 7.1, now it only
> succeeds intermittently.  Here's a new error:
> 
> 
> | last_error             | Failed to change power state to 'power off'.
> Error: not all arguments  |
> |                        | converted during string formatting               
> |
> 
> Bumping to urgent.

This specific error is likely due to the ipmitool sending commands too fast to the iDRAC, causing it to freeze up, requiring a power cycle of the entire system.

A suggestion has been made to edit /etc/ironic/ironic.conf and add this parameter to the [ipmi] section:

min_command_interval=10

Which will force 10s interval between ipmi commands.  The operator should adjust this value as necessary.
Comment 11 John Matthews 2016-06-20 14:15:01 EDT
Please re-test with OSP-8
Comment 12 Thom Carlin 2016-07-01 08:20:27 EDT
Verified RHOSP Register Node works with QCI 1.2 at least once.  Please reopen if this reoccurs.
Comment 18 errata-xmlrpc 2016-09-13 12:23:27 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1862

Note You need to log in before you can comment on or make changes to this bug.