Bug 1385114 - Deployment fails with Ironic API errors and nodes stuck in wait-call-back when one of the macs addresses of the node is of type infiniband
Summary: Deployment fails with Ironic API errors and nodes stuck in wait-call-back whe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 10.0 (Newton)
Assignee: Lucas Alvares Gomes
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-14 19:05 UTC by Sai Sindhur Malleni
Modified: 2016-12-14 16:19 UTC (History)
7 users (show)

Fixed In Version: openstack-ironic-6.2.1-3.el7ost
Doc Type: Bug Fix
Doc Text:
To determine which node is being deployed, the deploy ramdisk (IPA) provides the Bare Metal provisioning service with a list of MAC addresses as unique identifiers for that node. In previous releases, the Bare Metal provisioning service only expected normal MAC address formats; namely, 6 octets. The GID of Infiniband NICs, however, have 20 octets. As such, whenever an Infiniband NIC was present on the node, the deployment would fail since the Bare Metal provisioning API could not validate the MAC address correctly. With this release, the Bare Metal provisioning service now ignores MAC addresses that don't conform with the normal MAC address format of 6 octets.
Clone Of:
Environment:
Last Closed: 2016-12-14 16:19:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 392114 0 None MERGED API: lookup() ignore malformed MAC addresses 2021-02-08 00:51:24 UTC
Red Hat Product Errata RHEA-2016:2948 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Sai Sindhur Malleni 2016-10-14 19:05:28 UTC
Description of problem:

When one of the interfaces on the node has a MAC address of type infiniband(MAC greater than 6 octects), we see 400 errors in the ironic api such as:

2016-10-14 14:42:03.171 18045 DEBUG wsme.api [req-a257a01c-0896-435a-8416-70a03bf50a56 - - - - -] Client-side error: Invalid input for field/attribute addresses. Value: '80:00:02:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:54:06:c2,f4:52:14:54:06:c1,a0:d3:c1:04:44:17,a0:d3:c1:04:44:14,a0:d3:c1:04:44:16,a0:d3:c1:04:44:15'. unable to convert to list format_exception /usr/lib/python2.7/site-packages/wsme/api.py:221
2016-10-14 14:42:03.174 18045 INFO eventlet.wsgi.server [req-a257a01c-0896-435a-8416-70a03bf50a56 - - - - -] 192.0.2.9 "GET /v1/lookup?addresses=80%3A00%3A02%3A48%3Afe%3A80%3A00%3A00%3A00%3A00%3A00%3A00%3Af4%3A52%3A14%3A03%3A00%3A54%3A06%3Ac2%2Cf4%3A52%3A14%3A54%3A06%3Ac1%2Ca0%3Ad3%3Ac1%3A04%3A44%3A17%2Ca0%3Ad3%3Ac1%3A04%3A44%3A14%2Ca0%3Ad3%3Ac1%3A04%3A44%3A16%2Ca0%3Ad3%3Ac1%3A04%3A44%3A15 HTTP/1.1" status: 400 len: 657 time: 0.0098739

It is worth mentioning that introspection succeeds but when deploying overcloud the nodes are stuck in wait-call-back and we see ironic api 400 errors on the console of the nodes and undercloud ironic logs. Eventually deployment fails with no valid host found errors.

Version-Release number of selected component (if applicable):


How reproducible:
100% when one of the interfaces in infiniband

Steps to Reproduce:
1. Install undercloud
2. Introspect
3. deploy

Actual results:
Nodes are stuck in wait-call-back and eventually deployment fails with no valid host found errors.

Expected results:
Deployment should succeed

Additional info:
Talking to Lucas(lucasagomes) on IRC, he says currently infiniband isn't supported and verified it as follows:
http://paste.openstack.org/show/585738/
Also, fwiw, we had earlier versions of OSP(9,8) working on the same environment.

Comment 1 Dmitry Tantsur 2016-10-17 17:52:56 UTC
Looks like our new ramdisk API has broken it.. I wonder if we should validate MACs at all in lookup.

Comment 4 Raviv Bar-Tal 2016-11-17 09:40:25 UTC
Hi,
Unfortunately I don't access to infiniband nic's , and I can not test this but.
From the above comments I see RAM disk API was broken and  got fix.
Can you verify / advice if new OSPD is working for you and bug can be closed?

Comment 6 Sai Sindhur Malleni 2016-12-06 06:25:37 UTC
I can confirm that I'm not seeing this error in RC.

Comment 8 errata-xmlrpc 2016-12-14 16:19:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.