Bug 2211411 - neutron crashes when ironic tries to unbind instance ports from node
Summary: neutron crashes when ironic tries to unbind instance ports from node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 17.1
Assignee: Rodolfo Alonso
QA Contact: rlobillo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-31 13:56 UTC by rlobillo
Modified: 2023-08-16 01:15 UTC (History)
13 users (show)

Fixed In Version: openstack-neutron-18.6.1-1.20230518200967.el9ost
Doc Type: Known Issue
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:15:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-25516 0 None None None 2023-05-31 13:58:32 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:15:52 UTC

Description rlobillo 2023-05-31 13:56:50 UTC
Description of problem:

Openshift on Openstack installation with Baremetal Workers is failing on D/S shiftstack CI on top of 17.1 due to an error happening on neutron during the provisioning procedure.

During the workers deployment, ironic-conductor logs shows below:

023-05-30 18:37:21.016 2 INFO ironic.drivers.modules.network.flat [req-2921983d-ade2-4058-a705-64e793ed3ecf 9d19d158db494ffca5ae04c50ea6506d 261b6634edc640d4ad152c00e29fd181 - default default] Unbinding instance ports from node f31a1e44-a12b-4bee-9d3d-ddc070f93617

and that process is failing:

023-05-30 18:37:23.891 2 ERROR ironic.conductor.utils [req-2921983d-ade2-4058-a705-64e793ed3ecf 9d19d158db494ffca5ae04c50ea6506d 261b6634edc640d4ad152c00e29fd181 - default default] Error while preparing to deploy to node f31a1e44-a12b-4bee-9d3d-ddc070f93617: Unable to clear binding profile for neutron port 2c428b74-41a6-4219-b349-1ff855591fb2. Error: HttpException: 500: Server Error for url: http://172.17.1.120:9696/v2.0/ports/2c428b74-41a6-4219-b349-1ff855591fb2, Request Failed: internal server error while processing your request.: ironic.common.exception.NetworkError: Unable to clear binding profile for neutron port 2c428b74-41a6-4219-b349-1ff855591fb2. Error: HttpException: 500: Server Error for url: http://172.17.1.120:9696/v2.0/ports/2c428b74-41a6-4219-b349-1ff855591fb2, Request Failed: internal server error while processing your request.

On neutron logs, it's complaining because the request coming from ironic-conductor includes the variable mac_address empty:

2023-05-30 18:37:23.880 18 DEBUG neutron.api.v2.base [req-f1559021-507f-4e37-96e1-c2f30b61594f 9d19d158db494ffca5ae04c50ea6506d 261b6634edc640d4ad152c00e29fd181 - default default] Request body: {'port': {'mac_address': None}} prepare_request_body /usr/lib/python3.9/site-packages/neutron/api/v2/base.py:729
2023-05-30 18:37:23.881 18 WARNING neutron.pecan_wsgi.hooks.body_validation [req-f1559021-507f-4e37-96e1-c2f30b61594f 9d19d158db494ffca5ae04c50ea6506d 261b6634edc640d4ad152c00e29fd181 - default default] An exception happened while processing the request body. The exception message is [None is not str() or unicode()!].
2023-05-30 18:37:23.882 18 ERROR neutron.pecan_wsgi.hooks.translation [req-f1559021-507f-4e37-96e1-c2f30b61594f 9d19d158db494ffca5ae04c50ea6506d 261b6634edc640d4ad152c00e29fd181 - default default] PUT failed.: TypeError: None is not str() or unicode()!

Version-Release number of selected component (if applicable):
This issue is happening on 17.1 (RHOS-17.1-RHEL-9-20230525.n.1) and the same is working fine in 16.2 (RHOS-16.2-RHEL-8-20230413.n.1).

How reproducible:
Always

Steps to Reproduce:
1. Deploy a ironic instance on overcloud.

Actual results: instance is moved to ERROR
Expected results: instance in ACTIVE status and operational.


Additional info: sosreport link in private comment

Comment 13 rlobillo 2023-06-20 08:18:26 UTC
Verified on RHOS-17.1-RHEL-9-20230613.n.1 with OVN using OCP4.12.0-0.nightly-2023-06-19-183454.

The Baremetal workers in openshift are successfuly deployed with regular OCP IPI installation:

$ oc get nodes
NAME                          STATUS   ROLES                  AGE     VERSION
ostest-r99k7-master-0         Ready    control-plane,master   4h11m   v1.25.10+8c21020
ostest-r99k7-master-1         Ready    control-plane,master   4h11m   v1.25.10+8c21020
ostest-r99k7-master-2         Ready    control-plane,master   4h11m   v1.25.10+8c21020
ostest-r99k7-worker-0-4rqmp   Ready    worker                 3h40m   v1.25.10+8c21020
ostest-r99k7-worker-0-tth8j   Ready    worker                 3h44m   v1.25.10+8c21020
workervirt-2rqw8              Ready    worker                 83m     v1.25.10+8c21020

where:

(shiftstack) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+-----------------------------+--------+---------------------------+--------------------+-----------+
| ID                                   | Name                        | Status | Networks                  | Image              | Flavor    |
+--------------------------------------+-----------------------------+--------+---------------------------+--------------------+-----------+
| 877a455b-4bcc-4f25-97a3-a93a11c7e5ae | workervirt-2rqw8            | ACTIVE | provisioning=172.27.7.186 | ostest-r99k7-rhcos | worker    |
| 9e05768a-c3de-4a92-a987-bd5fa5ccf36a | ostest-r99k7-worker-0-4rqmp | ACTIVE | provisioning=172.27.7.167 | ostest-r99k7-rhcos | baremetal |
| 20ebd24e-df1c-4c28-be30-01861f19b094 | ostest-r99k7-worker-0-tth8j | ACTIVE | provisioning=172.27.7.184 | ostest-r99k7-rhcos | baremetal |
| 9350e512-afdd-4860-8736-eb20a657c041 | ostest-r99k7-master-2       | ACTIVE | provisioning=172.27.7.170 | ostest-r99k7-rhcos | master    |
| 95cd50dd-ddb7-4523-ae5f-82b210900efa | ostest-r99k7-master-1       | ACTIVE | provisioning=172.27.7.163 | ostest-r99k7-rhcos | master    |
| f6f302b2-84f8-4526-a7df-a35422317526 | ostest-r99k7-master-0       | ACTIVE | provisioning=172.27.7.164 | ostest-r99k7-rhcos | master    |
+--------------------------------------+-----------------------------+--------+---------------------------+--------------------+-----------+
(shiftstack) [stack@undercloud-0 ~]$ . overcloudrc && openstack baremetal node list
+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name    | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+
| 74e3ba2a-1bfd-42ad-8605-86ee2a608c2e | titan33 | 9e05768a-c3de-4a92-a987-bd5fa5ccf36a | power on    | active             | False       |
| 4dedf0ec-6366-4385-ab0c-0093016ceeb8 | titan34 | 20ebd24e-df1c-4c28-be30-01861f19b094 | power on    | active             | False       |
+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+

Comment 21 errata-xmlrpc 2023-08-16 01:15:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.