Bug 1308714 - Register Nodes inconsistently fails
Summary: Register Nodes inconsistently fails
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-discoverd
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 10.0 (Newton)
Assignee: RHOS Maint
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-15 21:01 UTC by Thom Carlin
Modified: 2020-03-11 15:02 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-03 18:48:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1225226 0 unspecified CLOSED Nodes discovery fails when instackenv.json file contains nodes mac addresses. 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1327682 0 unspecified CLOSED After update, Register Nodes won't start discovery 2021-02-22 00:41:40 UTC

Internal Links: 1225226 1327682

Description Thom Carlin 2016-02-15 21:01:05 UTC
Description of problem:

Identical nodes will not always register successfully

Version-Release number of selected component (if applicable):

RHCI TP2 RC9

How reproducible:

Intermittent but frequent

Steps to Reproduce:
1. Install RHCI and TripleO ISOs
2. Log in to run launch-fusor*-installer
3. Deploy RHEL-OSP

Actual results:

Register Nodes fails randomly

Expected results:

All nodes consistently register

Additional info:

End user sees Introspection Timeout

On remote console, see dracut emergency shell
In /discovery-logs, see ironic-discoverd-ramdisk: discvoerd error: 404: Could not find a node for attributes {'bmc_address': u'<drac_address>', 'mac': [u'<mac_address>' ]}

On Undercloud node:
/var/log/messages excerpts:
ironic-discoverd: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 6 times more: Authorization Failed: Unable to establish connection to http://192.0.2.1:5000/v2.0/tokens
ironic-discoverd: INFO:ironic_discoverd.main:Enabled processing hooks: ['ramdisk_error', 'root_device_hint', 'scheduler', 'validate_interfaces', 'edeploy']
ironic-discoverd: INFO:werkzeug: * Running on http://0.0.0.0:5050/
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:49:17] "POST /v1/introspection/<uuid> HTTP/1.1" 202 -
ironic-discoverd: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'<mac_address>'] for node <uuid> on the firewall
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:49:18] "POST /v1/introspection/<uuid> HTTP/1.1" 202 -
ironic-discoverd: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'<mac_address>'] for node <mac_address> on the firewall
[...]
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Discovered data: CPUs: <number_cpus> x86_64, memory <memory_size> MiB, disk <disk_size> GiB
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:PXE boot interface was <mac_address>
ironic-discoverd: WARNING:ironic_discoverd.plugins.standard:The following interfaces were invalid or not eligible in introspection data for node with BMC <bmc_ip> and were excluded: {<list>}}
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Eligible interfaces are {<list>}}
ironic-discoverd: ERROR:ironic_discoverd.node_cache:Introspection for nodes [<node_list>] has timed out
ironic-discoverd: ERROR:ironicclient.common.http:Error contacting Ironic server: A port with MAC address <mac_address> already exists. (HTTP 409). Attempt 61 of 61
ironic-discoverd: WARNING:ironic_discoverd.process:MAC <mac_address> appeared in introspection data for node <node_uuid>, but already exists in database - skipping
ironic-discoverd: INFO:werkzeug:192.0.2.100 - - [15/Feb/2016 14:55:00] "POST /v1/continue HTTP/1.1" 200 -
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:55:00] "GET /v1/introspection/<uuid> HTTP/1.1" 200 -
ironic-discoverd: INFO:werkzeug:192.0.2.254 - - [15/Feb/2016 14:55:00] "GET /v1/introspection/<uuid> HTTP/1.1" 200 -
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Discovered data: CPUs: <num_cpus> x86_64, memory <mem_size> MiB, disk <disk_space> GiB
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:PXE boot interface was <mac_address>
ironic-discoverd: WARNING:ironic_discoverd.plugins.standard:The following interfaces were invalid or not eligible in introspection data for node with BMC <bmc_address> and were excluded: {<list>}
ironic-discoverd: INFO:ironic_discoverd.plugins.standard:Eligible interfaces are {<list>}
ironic-discoverd: ERROR:ironic_discoverd.utils:Could not find a node for attributes {'bmc_address': u'<valid_bmc_ip>', 'mac': [u'<valid_mac>']}

The nodes and ports appear to be correct in the ironic database

Comment 1 Thom Carlin 2016-02-15 21:47:30 UTC
Could this be due to the target systems being Dell Generation 12 systems and using pxe_ipmitool?

Comment 3 Mike Burns 2016-04-07 21:11:06 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 5 Thom Carlin 2016-04-18 15:23:31 UTC
Occurs on TP3 RC4 + updated QCI and Triple O systems.

https://review.gerrithub.io/#/c/234406/2/scripts/instack-ironic-deployment has already been applied in the updates.

Comment 6 Thom Carlin 2016-04-18 20:11:33 UTC
Ironic dnsmasq is reporting No Address available.

Potential workaround from https://bugs.launchpad.net/neutron/+bug/1271344:
systemctl restart neutron-dhcp-agent

Comment 7 Thom Carlin 2016-04-19 11:31:28 UTC
Also need to check status of services after yum update.

Comment 8 Thom Carlin 2016-04-19 11:56:13 UTC
Another thing to try (from https://bugzilla.redhat.com/show_bug.cgi?id=1301659#c11):
Append "dhcp-sequential-ip" to /etc/ironic-discoverd/dnsmasq.conf

Comment 9 Thom Carlin 2016-04-19 19:07:20 UTC
And another: Only register 1 node at a time to avoid the DNSMASQ hash collision

Comment 11 Dmitry Tantsur 2016-10-03 16:00:41 UTC
Hi!

"MAC address ... already exists" can be ignored, it's just a false negative.

The real problem seems to be "Could not find a node", but the parts you've skipped from the logs (MACs, IP addresses) are actually required for investigation. If you still struggle with this error, could you please paste the full logs?

Comment 12 Thom Carlin 2016-10-03 17:38:53 UTC
Since it has been 8 months, the logs are no longer available

Comment 13 Dmitry Tantsur 2016-10-03 18:48:06 UTC
Understood, please reopen if you reproduce it again.


Note You need to log in before you can comment on or make changes to this bug.