Bug 1385114

Summary: Deployment fails with Ironic API errors and nodes stuck in wait-call-back when one of the macs addresses of the node is of type infiniband
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: openstack-ironicAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: ddomingo, dtantsur, jjoyce, mburns, rhel-osp-director-maint, smalleni, srevivo
Target Milestone: gaKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-6.2.1-3.el7ost Doc Type: Bug Fix
Doc Text:
To determine which node is being deployed, the deploy ramdisk (IPA) provides the Bare Metal provisioning service with a list of MAC addresses as unique identifiers for that node. In previous releases, the Bare Metal provisioning service only expected normal MAC address formats; namely, 6 octets. The GID of Infiniband NICs, however, have 20 octets. As such, whenever an Infiniband NIC was present on the node, the deployment would fail since the Bare Metal provisioning API could not validate the MAC address correctly. With this release, the Bare Metal provisioning service now ignores MAC addresses that don't conform with the normal MAC address format of 6 octets.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 16:19:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sai Sindhur Malleni 2016-10-14 19:05:28 UTC
Description of problem:

When one of the interfaces on the node has a MAC address of type infiniband(MAC greater than 6 octects), we see 400 errors in the ironic api such as:

2016-10-14 14:42:03.171 18045 DEBUG wsme.api [req-a257a01c-0896-435a-8416-70a03bf50a56 - - - - -] Client-side error: Invalid input for field/attribute addresses. Value: '80:00:02:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:54:06:c2,f4:52:14:54:06:c1,a0:d3:c1:04:44:17,a0:d3:c1:04:44:14,a0:d3:c1:04:44:16,a0:d3:c1:04:44:15'. unable to convert to list format_exception /usr/lib/python2.7/site-packages/wsme/api.py:221
2016-10-14 14:42:03.174 18045 INFO eventlet.wsgi.server [req-a257a01c-0896-435a-8416-70a03bf50a56 - - - - -] 192.0.2.9 "GET /v1/lookup?addresses=80%3A00%3A02%3A48%3Afe%3A80%3A00%3A00%3A00%3A00%3A00%3A00%3Af4%3A52%3A14%3A03%3A00%3A54%3A06%3Ac2%2Cf4%3A52%3A14%3A54%3A06%3Ac1%2Ca0%3Ad3%3Ac1%3A04%3A44%3A17%2Ca0%3Ad3%3Ac1%3A04%3A44%3A14%2Ca0%3Ad3%3Ac1%3A04%3A44%3A16%2Ca0%3Ad3%3Ac1%3A04%3A44%3A15 HTTP/1.1" status: 400 len: 657 time: 0.0098739

It is worth mentioning that introspection succeeds but when deploying overcloud the nodes are stuck in wait-call-back and we see ironic api 400 errors on the console of the nodes and undercloud ironic logs. Eventually deployment fails with no valid host found errors.

Version-Release number of selected component (if applicable):


How reproducible:
100% when one of the interfaces in infiniband

Steps to Reproduce:
1. Install undercloud
2. Introspect
3. deploy

Actual results:
Nodes are stuck in wait-call-back and eventually deployment fails with no valid host found errors.

Expected results:
Deployment should succeed

Additional info:
Talking to Lucas(lucasagomes) on IRC, he says currently infiniband isn't supported and verified it as follows:
http://paste.openstack.org/show/585738/
Also, fwiw, we had earlier versions of OSP(9,8) working on the same environment.

Comment 1 Dmitry Tantsur 2016-10-17 17:52:56 UTC
Looks like our new ramdisk API has broken it.. I wonder if we should validate MACs at all in lookup.

Comment 4 Raviv Bar-Tal 2016-11-17 09:40:25 UTC
Hi,
Unfortunately I don't access to infiniband nic's , and I can not test this but.
From the above comments I see RAM disk API was broken and  got fix.
Can you verify / advice if new OSPD is working for you and bug can be closed?

Comment 6 Sai Sindhur Malleni 2016-12-06 06:25:37 UTC
I can confirm that I'm not seeing this error in RC.

Comment 8 errata-xmlrpc 2016-12-14 16:19:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html