Bug 1592432 - Ironic API should return 503 for action requiring a conductor when no conductors are online
Summary: Ironic API should return 503 for action requiring a conductor when no conduct...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 13.0 (Queens)
Assignee: Harald Jensås
QA Contact: bjacot
URL:
Whiteboard:
Depends On: 1590297 1592427 1592466
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-18 14:10 UTC by Matthew Booth
Modified: 2019-09-09 16:56 UTC (History)
26 users (show)

Fixed In Version: openstack-ironic-10.1.3-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1590297
Environment:
Last Closed: 2018-08-29 16:32:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack Storyboard 2002600 0 None None None 2018-06-18 15:52:49 UTC
OpenStack gerrit 576608 0 'None' MERGED Raise TemporaryFailure if no conductors are online 2020-12-24 09:23:26 UTC
OpenStack gerrit 577752 0 'None' MERGED Raise TemporaryFailure if no conductors are online 2020-12-24 09:23:26 UTC
Red Hat Product Errata RHBA-2018:2593 0 None None None 2018-08-29 16:32:53 UTC

Comment 1 Matthew Booth 2018-06-18 14:20:41 UTC
The original error could also have been resolved if nova retried the plug_vifs call for a while. Nova doesn't yet have enough information from ironic to do this, because ironic returned a 400 with the error text No Valid Hosts. However, that's not a user-actionable error. We're not telling the user that the cloud is out of space and they should give up; we're telling them that ironic was temporarily unable to fulfil their request, but a retry might work. I believe the appropriate return code for that would be 503.

With that in place, Nova should be able to implement a retry.

Comment 2 Harald Jensås 2018-06-19 18:57:19 UTC
$ openstack server list
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+
| ID                                   | Name        | Status | Networks                                             | Image                    | Flavor    |
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+
| 03e00b04-293b-4e43-b213-e4a2a1f83f86 | test-server | ACTIVE | private=10.0.0.4, fd1d:c824:765a:0:5054:ff:fe7d:1a24 | cirros-0.3.5-x86_64-disk | baremetal |
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+

$ sudo systemctl stop devstack
$ sudo systemctl restart devstack@n*

$ openstack server list
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+
| ID                                   | Name        | Status | Networks                                             | Image                    | Flavor    |
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+
| 03e00b04-293b-4e43-b213-e4a2a1f83f86 | test-server | ACTIVE | private=10.0.0.4, fd1d:c824:765a:0:5054:ff:fe7d:1a24 | cirros-0.3.5-x86_64-disk | baremetal |
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+

With https://review.openstack.org/576608 I see the following in logs:

Jun 19 20:39:17 devstack devstack[20946]: ERROR wsme.api [None req-9aec21be-8c72-40a4-9939-bc2cc985deaf demo admin] Server-side error: "Resource temporarily unavailable, please retry.". Detail:
Jun 19 20:39:17 devstack devstack[20946]: Traceback (most recent call last):
Jun 19 20:39:17 devstack devstack[20946]: File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
Jun 19 20:39:17 devstack devstack[20946]: result = f(self, *args, **kwargs)
Jun 19 20:39:17 devstack devstack[20946]: File "/opt/stack/ironic/ironic/api/controllers/v1/node.py", line 1435, in post
Jun 19 20:39:17 devstack devstack[20946]: rpc_node, topic = self._get_node_and_topic()
Jun 19 20:39:17 devstack devstack[20946]: File "/opt/stack/ironic/ironic/api/controllers/v1/node.py", line 1407, in _get_node_and_topic
Jun 19 20:39:17 devstack devstack[20946]: return rpc_node, pecan.request.rpcapi.get_topic_for(rpc_node)
Jun 19 20:39:17 devstack devstack[20946]: File "/opt/stack/ironic/ironic/conductor/rpcapi.py", line 134, in get_topic_for
Jun 19 20:39:17 devstack devstack[20946]: raise exception.TemporaryFailure()
Jun 19 20:39:17 devstack devstack[20946]: TemporaryFailure: Resource temporarily unavailable, please retry.
Jun 19 20:39:17 devstack devstack[20946]: : TemporaryFailure: Resource temporarily unavailable, please retry.
Jun 19 20:39:17 devstack devstack[20946]: [pid: 20947|app: 0|req: 290/578] 192.168.122.139 () {64 vars in 1320 bytes} [Tue Jun 19 20:39:16 2018] POST /baremetal/v1/nodes/d5c9dc49-fd69-45ec-ad6c-e7c5a87f94e7/vifs => generated 139 bytes in 122 msecs (HTTP/1.1 503) 7 headers in 301 bytes (1 switches on core 0)
Jun 19 20:39:17 devstack devstack[20946]: ERROR wsme.api [None req-375c765c-c47b-4f45-b54a-ed4eef0383d4 demo admin] Server-side error: "Resource temporarily unavailable, please retry.". Detail:
Jun 19 20:39:17 devstack devstack[20946]: Traceback (most recent call last):
Jun 19 20:39:17 devstack devstack[20946]: File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
Jun 19 20:39:17 devstack devstack[20946]: result = f(self, *args, **kwargs)
Jun 19 20:39:17 devstack devstack[20946]: File "/opt/stack/ironic/ironic/api/controllers/v1/node.py", line 1435, in post
Jun 19 20:39:17 devstack devstack[20946]: rpc_node, topic = self._get_node_and_topic()
Jun 19 20:39:17 devstack devstack[20946]: File "/opt/stack/ironic/ironic/api/controllers/v1/node.py", line 1407, in _get_node_and_topic
Jun 19 20:39:17 devstack devstack[20946]: return rpc_node, pecan.request.rpcapi.get_topic_for(rpc_node)
Jun 19 20:39:17 devstack devstack[20946]: File "/opt/stack/ironic/ironic/conductor/rpcapi.py", line 134, in get_topic_for
Jun 19 20:39:17 devstack devstack[20946]: raise exception.TemporaryFailure()
Jun 19 20:39:17 devstack devstack[20946]: TemporaryFailure: Resource temporarily unavailable, please retry.
Jun 19 20:39:17 devstack devstack[20946]: : TemporaryFailure: Resource temporarily unavailable, please retry.
Jun 19 20:39:17 devstack devstack[20946]: [pid: 20948|app: 0|req: 289/579] 192.168.122.139 () {64 vars in 1320 bytes} [Tue Jun 19 20:39:17 2018] POST /baremetal/v1/nodes/d5c9dc49-fd69-45ec-ad6c-e7c5a87f94e7/vifs => generated 139 bytes in 35 msecs (HTTP/1.1 503) 7 headers in 301 bytes (1 switches on core 0)



$ sudo systemctl restart devstack
$ sudo systemctl restart devstack@n*
$ openstack server list
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+
| ID                                   | Name        | Status | Networks                                             | Image                    | Flavor    |
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+
| 03e00b04-293b-4e43-b213-e4a2a1f83f86 | test-server | ACTIVE | private=10.0.0.4, fd1d:c824:765a:0:5054:ff:fe7d:1a24 | cirros-0.3.5-x86_64-disk | baremetal |
+--------------------------------------+-------------+--------+------------------------------------------------------+--------------------------+-----------+

$ openstack server delete test-server

$ openstack server list
[ ... no nova server instance ...]

$ openstack baremetal node list
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name   | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
| d5c9dc49-fd69-45ec-ad6c-e7c5a87f94e7 | node-0 | 03e00b04-293b-4e43-b213-e4a2a1f83f86 | power on    | active             | False       |
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+

 ISSUE!!! ^^ The baremetal node is not undeployed.

Comment 3 Harald Jensås 2018-06-19 21:17:54 UTC
The baremetal node is not undeployed because Nova only does a local delete since the "nova-compute" host the host is down.

Jun 19 23:05:12 devstack devstack[14002]: DEBUG nova.api.openstack.wsgi [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Calling method '<bound method ServersController.delete of <nova.api.openstack.compute.servers.ServersController object at 0x7ff3c9a937d0>>' {{(pid=14004) _process_stack /opt/stack/nova/nova/api/openstack/wsgi.py:618}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG nova.compute.api [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] [instance: f5410e1a-112b-47c2-9234-597562ec6859] Fetching instance by UUID {{(pid=14004) get /opt/stack/nova/nova/compute/api.py:2352}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG oslo_concurrency.lockutils [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Lock "bcf153e2-8acc-41fa-9e24-734d9415bbf2" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s {{(pid=14004) inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG oslo_concurrency.lockutils [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Lock "bcf153e2-8acc-41fa-9e24-734d9415bbf2" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.000s {{(pid=14004) inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG nova.compute.api [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] [instance: f5410e1a-112b-47c2-9234-597562ec6859] Going to try to terminate instance {{(pid=14004) delete /opt/stack/nova/nova/compute/api.py:2182}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG nova.servicegroup.drivers.db [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Seems service nova-compute on host devstack is down. Last heartbeat was 2018-06-19 20:44:18. Elapsed time is 1254.354436 {{(pid=14004) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:79}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG nova.compute.api [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] [instance: f5410e1a-112b-47c2-9234-597562ec6859] Doing local delete in cell bcf153e2-8acc-41fa-9e24-734d9415bbf2(cell1) {{(pid=14004) _delete /opt/stack/nova/nova/compute/api.py:2007}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG oslo_concurrency.lockutils [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Lock "bcf153e2-8acc-41fa-9e24-734d9415bbf2" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s {{(pid=14004) inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273}}
Jun 19 23:05:12 devstack devstack[14002]: DEBUG oslo_concurrency.lockutils [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Lock "bcf153e2-8acc-41fa-9e24-734d9415bbf2" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.000s {{(pid=14004) inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285}}
Jun 19 23:05:12 devstack devstack[14002]: WARNING nova.compute.api [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] [instance: f5410e1a-112b-47c2-9234-597562ec6859] instance's host devstack is down, deleting from database
Jun 19 23:05:12 devstack devstack[14002]: DEBUG nova.network.neutronv2.api [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] [instance: f5410e1a-112b-47c2-9234-597562ec6859] deallocate_for_instance() {{(pid=14004) deallocate_for_instance /opt/stack/nova/nova/network/neutronv2/api.py:1266}}
Jun 19 23:05:13 devstack devstack[14002]: DEBUG nova.network.base_api [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] [instance: f5410e1a-112b-47c2-9234-597562ec6859] Updating instance_info_cache with network_info: [] {{(pid=14004) update_instance_cache_with_nw_info /opt/stack/nova/nova/network/base_api.py:48}}
Jun 19 23:05:13 devstack devstack[14002]: DEBUG oslo_concurrency.lockutils [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Lock "placement_client" acquired by "nova.scheduler.client.report._create_client" :: waited 0.000s {{(pid=14004) inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273}}
Jun 19 23:05:13 devstack devstack[14002]: DEBUG oslo_concurrency.lockutils [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Lock "placement_client" released by "nova.scheduler.client.report._create_client" :: held 0.003s {{(pid=14004) inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285}}
Jun 19 23:05:13 devstack devstack[14002]: INFO nova.scheduler.client.report [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] Deleted allocation for instance f5410e1a-112b-47c2-9234-597562ec6859
Jun 19 23:05:13 devstack devstack[14002]: INFO nova.api.openstack.requestlog [None req-4bfb53b3-072c-46a1-b8c5-af8d632cc22c demo admin] 192.168.122.139 "DELETE /compute/v2.1/servers/f5410e1a-112b-47c2-9234-597562ec6859" status: 204 len: 0 microversion: 2.1 time: 1.612695

Comment 4 Harald Jensås 2018-06-20 07:48:05 UTC
$ openstack compute service list
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+
| ID | Binary           | Host                               | Zone     | Status  | State | Updated At                 |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+
|  2 | nova-scheduler   | inspector-devstack.lab.example.com | internal | enabled | up    | 2018-06-20T07:31:30.000000 |
|  4 | nova-consoleauth | inspector-devstack.lab.example.com | internal | enabled | up    | 2018-06-20T07:31:32.000000 |
|  5 | nova-conductor   | inspector-devstack.lab.example.com | internal | enabled | up    | 2018-06-20T07:31:28.000000 |
|  1 | nova-conductor   | inspector-devstack.lab.example.com | internal | enabled | up    | 2018-06-20T07:31:27.000000 |
|  2 | nova-compute     | inspector-devstack.lab.example.com | nova     | enabled | down  | 2018-06-19T21:55:46.000000 |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+


I guess we need some retry logic in the ironic driver nova side that will do an updata to set the state to up.
Or manually? but the following does not work:

$ openstack compute service set --up inspector-devstack.lab.example.com nova
Failed to set service state to up
Compute service nova of host inspector-devstack.lab.example.com failed to set.

Comment 11 bjacot 2018-08-07 18:55:06 UTC
installed OSP13

cat /etc/yum.repos.d/latest-installed 
13   -p 2018-07-30.2

Verified gerrit changes were included in this release.  Please reopen if issue is still present.

Comment 12 Joanne O'Flynn 2018-08-15 13:50:09 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 14 errata-xmlrpc 2018-08-29 16:32:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2593

Comment 15 Harald Jensås 2018-10-29 11:53:57 UTC
Clearing needinfo, the bug is closed.


Note You need to log in before you can comment on or make changes to this bug.