Bug 1308714 - Register Nodes inconsistently fails
Register Nodes inconsistently fails
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-discoverd (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
unspecified Severity medium
: ---
: 10.0 (Newton)
Assigned To: RHOS Maint
Shai Revivo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-15 16:01 EST by Thom Carlin
Modified: 2017-02-01 21:53 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-03 14:48:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thom Carlin 2016-02-15 16:01:05 EST
Description of problem:

Identical nodes will not always register successfully

Version-Release number of selected component (if applicable):

RHCI TP2 RC9

How reproducible:

Intermittent but frequent

Steps to Reproduce:
1. Install RHCI and TripleO ISOs
2. Log in to run launch-fusor*-installer
3. Deploy RHEL-OSP

Actual results:

Register Nodes fails randomly

Expected results:

All nodes consistently register

Additional info:

End user sees Introspection Timeout

On remote console, see dracut emergency shell
In /discovery-logs, see ironic-discoverd-ramdisk: discvoerd error: 404: Could not find a node for attributes {'bmc_address': u'<drac_address>', 'mac': [u'<mac_address>' ]}

On Undercloud node:
/var/log/messages excerpts:
ironic-discoverd: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 6 times more: Authorization Failed: Unable to establish connection to http://192.0.2.1:5000/v2.0/tokens
ironic-discoverd: INFO:ironic_discoverd.main:Enabled processing hooks: ['ramdisk_error', 'root_device_hint', 'scheduler', 'validate_interfaces', 'edeploy']
ironic-discoverd: INFO:werkzeug: * Running on http://0.0.0.0:5050/
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:49:17] "POST /v1/introspection/<uuid> HTTP/1.1" 202 -
ironic-discoverd: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'<mac_address>'] for node <uuid> on the firewall
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:49:18] "POST /v1/introspection/<uuid> HTTP/1.1" 202 -
ironic-discoverd: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'<mac_address>'] for node <mac_address> on the firewall
[...]
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Discovered data: CPUs: <number_cpus> x86_64, memory <memory_size> MiB, disk <disk_size> GiB
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:PXE boot interface was <mac_address>
ironic-discoverd: WARNING:ironic_discoverd.plugins.standard:The following interfaces were invalid or not eligible in introspection data for node with BMC <bmc_ip> and were excluded: {<list>}}
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Eligible interfaces are {<list>}}
ironic-discoverd: ERROR:ironic_discoverd.node_cache:Introspection for nodes [<node_list>] has timed out
ironic-discoverd: ERROR:ironicclient.common.http:Error contacting Ironic server: A port with MAC address <mac_address> already exists. (HTTP 409). Attempt 61 of 61
ironic-discoverd: WARNING:ironic_discoverd.process:MAC <mac_address> appeared in introspection data for node <node_uuid>, but already exists in database - skipping
ironic-discoverd: INFO:werkzeug:192.0.2.100 - - [15/Feb/2016 14:55:00] "POST /v1/continue HTTP/1.1" 200 -
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:55:00] "GET /v1/introspection/<uuid> HTTP/1.1" 200 -
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:55:00] "GET /v1/introspection/<uuid> HTTP/1.1" 200 -
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Discovered data: CPUs: <num_cpus> x86_64, memory <mem_size> MiB, disk <disk_space> GiB
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:PXE boot interface was <mac_address>
ironic-discoverd: WARNING:ironic_discoverd.plugins.standard:The following interfaces were invalid or not eligible in introspection data for node with BMC <bmc_address> and were excluded: {<list>}
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Eligible interfaces are {<list>}
ironic-discoverd: ERROR:ironic_discoverd.utils:Could not find a node for attributes {'bmc_address': u'<valid_bmc_ip>', 'mac': [u'<valid_mac>']}

The nodes and ports appear to be correct in the ironic database
Comment 1 Thom Carlin 2016-02-15 16:47:30 EST
Could this be due to the target systems being Dell Generation 12 systems and using pxe_ipmitool?
Comment 3 Mike Burns 2016-04-07 17:11:06 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 5 Thom Carlin 2016-04-18 11:23:31 EDT
Occurs on TP3 RC4 + updated QCI and Triple O systems.

https://review.gerrithub.io/#/c/234406/2/scripts/instack-ironic-deployment has already been applied in the updates.
Comment 6 Thom Carlin 2016-04-18 16:11:33 EDT
Ironic dnsmasq is reporting No Address available.

Potential workaround from https://bugs.launchpad.net/neutron/+bug/1271344:
systemctl restart neutron-dhcp-agent
Comment 7 Thom Carlin 2016-04-19 07:31:28 EDT
Also need to check status of services after yum update.
Comment 8 Thom Carlin 2016-04-19 07:56:13 EDT
Another thing to try (from https://bugzilla.redhat.com/show_bug.cgi?id=1301659#c11):
Append "dhcp-sequential-ip" to /etc/ironic-discoverd/dnsmasq.conf
Comment 9 Thom Carlin 2016-04-19 15:07:20 EDT
And another: Only register 1 node at a time to avoid the DNSMASQ hash collision
Comment 11 Dmitry Tantsur 2016-10-03 12:00:41 EDT
Hi!

"MAC address ... already exists" can be ignored, it's just a false negative.

The real problem seems to be "Could not find a node", but the parts you've skipped from the logs (MACs, IP addresses) are actually required for investigation. If you still struggle with this error, could you please paste the full logs?
Comment 12 Thom Carlin 2016-10-03 13:38:53 EDT
Since it has been 8 months, the logs are no longer available
Comment 13 Dmitry Tantsur 2016-10-03 14:48:06 EDT
Understood, please reopen if you reproduce it again.

Note You need to log in before you can comment on or make changes to this bug.