Bug 1851507 - "openstack overcloud generate fencing [...]" fails with 404 Not Found if overcloud node is error state (nova) and not yet deployed with ironic
Summary: "openstack overcloud generate fencing [...]" fails with 404 Not Found if over...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.0 (Train)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z2
: 16.1 (Train on RHEL 8.2)
Assignee: Luca Miccini
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-26 18:11 UTC by Matt Flusche
Modified: 2023-10-06 20:53 UTC (History)
5 users (show)

Fixed In Version: openstack-tripleo-common-11.4.1-0.20200708213416.da384ef.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-28 15:38:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 739713 0 None MERGED Catch exception if servers are in error state with no bm_node attached 2020-09-21 07:32:51 UTC
Red Hat Issue Tracker OSP-3211 0 None None None 2022-08-23 18:37:13 UTC
Red Hat Product Errata RHEA-2020:4284 0 None None None 2020-10-28 15:38:31 UTC

Description Matt Flusche 2020-06-26 18:11:49 UTC
Description of problem:

Example failure in a lab:

(undercloud) [stack@undercloud13 ~]$ openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml instack.json       
Action tripleo.parameters.generate_fencing execution failed: Failed to run action [action_ex_id=None, action_cls='<class 'mistral.actions.action_factory.GenerateFencingParametersAction'>', attributes='{}', params='{u'ipmi_level': u'administrator', u'ipmi_cipher': None, u'ipmi_lanplus': True, u'delay': None, u'os_auth': None, u'nodes_json': [{u'pm_password': u'redhat', u'name': u'overcloud13-node1', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:a5:a6:e0'], u'pm_port': u'634', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node2', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bb:f9:0f'], u'pm_port': u'635', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node3', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:75:02:e7'], u'pm_port': u'636', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node4', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:d9:f7'], u'pm_port': u'637', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node5', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:0f:4c:7c'], u'pm_port': u'638', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node6', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:94:3b:fc'], u'pm_port': u'639', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node7', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:84:e3:6b'], u'pm_port': u'640', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node8', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:b8:5c'], u'pm_port': u'641', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph1', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:07:63:c8'], u'pm_port': u'642', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph2', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:37:d5:00'], u'pm_port': u'643', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph3', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:5d:46:76'], u'pm_port': u'644', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}]}']
 Not Found (HTTP 404)



For this situation.

Overcloud deployment failed during a scale up, example:

2020-06-26 17:53:55Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.Compute: Resource CREATE failed: ResourceInError: resources[2].resources.N
ovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"

 Stack overcloud UPDATE_FAILED 

overcloud.Compute.2.NovaCompute:
  resource_type: OS::TripleO::ComputeServer
  physical_resource_id: cc1f99cc-ee9f-4240-8053-b7e134a059c8
  status: CREATE_FAILED
  status_reason: |
    ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
Heat Stack update failed.


(undercloud) [stack@undercloud13 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| 099a1f89-0130-4174-9252-db6b7e748948 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=172.16.6.110 |
| 2255e1b9-407a-4bdc-8edc-56c072763c49 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=172.16.6.101 |
| 10c1d7c1-416f-4a0f-b076-f15fc2376d60 | overcloud-compute-1     | ACTIVE | -          | Running     | ctlplane=172.16.6.102 |
| cc1f99cc-ee9f-4240-8053-b7e134a059c8 | overcloud-compute-2     | ERROR  | -          | NOSTATE     |                       |
| 7824d837-b932-4e33-904c-54925b00d3d4 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=172.16.6.103 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+


No ironic node:

(undercloud) [stack@undercloud13 ~]$ openstack baremetal node list |grep cc1f99cc-ee9f-4240-8053-b7e134a059c8
(nil)

From mistral log:

2020-06-26 13:56:59.134 1702 DEBUG ironicclient.common.http [req-092a23aa-f897-4147-9fe5-2b0c203693c7 c03d8da0d553401ba7d64538c94d8bda b3d0c809dd634964b39491c769004753 - default default] curl -i -X GET -H 'X-OpenStack-Ironic-API-Version: 1.36' -H 'X-Auth-Token: {SHA1}e39ea52d4433f9a6fcbcb28de2c972ece6bca3d5' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'User-Agent: python-ironicclient' http://172.16.6.1:6385/v1/nodes/detail?instance_uuid=cc1f99cc-ee9f-4240-8053-b7e134a059c8 log_curl_request /usr/lib/python2.7/site-packages/ironicclient/common/http.py:337
2020-06-26 13:56:59.204 1702 DEBUG ironicclient.common.http [req-092a23aa-f897-4147-9fe5-2b0c203693c7 c03d8da0d553401ba7d64538c94d8bda b3d0c809dd634964b39491c769004753 - default default]
HTTP/1.1 200 OK
Date: Fri, 26 Jun 2020 17:56:59 GMT
Server: Apache
X-OpenStack-Ironic-API-Minimum-Version: 1.1
X-OpenStack-Ironic-API-Maximum-Version: 1.38
X-OpenStack-Ironic-API-Version: 1.36
Openstack-Request-Id: req-daa48895-2e6a-4ebd-9bc0-6702844f1289
Content-Length: 13
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/json

{"nodes": []}
 log_http_response /usr/lib/python2.7/site-packages/ironicclient/common/http.py:351
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor [req-092a23aa-f897-4147-9fe5-2b0c203693c7 c03d8da0d553401ba7d64538c94d8bda b3d0c809dd634964b39491c769004753 - default default] Failed to run action [action_ex_id=None, action_cls='<class 'mistral.actions.action_factory.GenerateFencingParametersAction'>', attributes='{}', params='{u'ipmi_level': u'administrator', u'ipmi_cipher': None, u'ipmi_lanplus': True, u'delay': None, u'os_auth': None, u'nodes_json': [{u'pm_password': u'redhat', u'name': u'overcloud13-node1', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:a5:a6:e0'], u'pm_port': u'634', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node2', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bb:f9:0f'], u'pm_port': u'635', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node3', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:75:02:e7'], u'pm_port': u'636', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node4', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:d9:f7'], u'pm_port': u'637', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node5', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:0f:4c:7c'], u'pm_port': u'638', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node6', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:94:3b:fc'], u'pm_port': u'639', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node7', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:84:e3:6b'], u'pm_port': u'640', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node8', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:b8:5c'], u'pm_port': u'641', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph1', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:07:63:c8'], u'pm_port': u'642', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph2', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:37:d5:00'], u'pm_port': u'643', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph3', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:5d:46:76'], u'pm_port': u'644', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}]}']
 Not Found (HTTP 404): NotFound: Not Found (HTTP 404)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor Traceback (most recent call last):
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     result = action.run(action_ctx)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/tripleo_common/actions/parameters.py", line 361, in run
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     self.get_compute_client(context))
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/tripleo_common/utils/nodes.py", line 694, in generate_hostmap
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     bm_node = baremetal_client.node.get_by_instance_uuid(node.id)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/ironicclient/v1/node.py", line 329, in get_by_instance_uuid
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     raise exc.NotFound()
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor NotFound: Not Found (HTTP 404)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor




Version-Release number of selected component (if applicable):

(undercloud) [stack@undercloud13 ~]$ rpm -q openstack-tripleo-common
openstack-tripleo-common-8.7.1-20.el7ost.noarch (current)


How reproducible:
100%

Steps to Reproduce:
1. see example above
2.
3.

It seems that warning about the missing node and skipping would be more correct in this situation

Comment 1 Luca Miccini 2020-07-01 09:23:42 UTC
I think the exception should be handled in the generate_hostmap function:


class GenerateFencingParametersAction(base.TripleOAction):
    """Generates fencing configuration for a deployment.
...

    def run(self, context):
        """Returns the parameters for fencing controller nodes"""
        hostmap = nodes.generate_hostmap(self.get_baremetal_client(context),
                                         self.get_compute_client(context))
        fence_params = {"EnableFencing": True, "FencingConfig": {}}
        devices = []
...



def generate_hostmap(baremetal_client, compute_client):
    """Create a map between Compute nodes and Baremetal nodes"""
    hostmap = {}
    for node in compute_client.servers.list():
        bm_node = baremetal_client.node.get_by_instance_uuid(node.id)
        for port in baremetal_client.port.list(node=bm_node.uuid):
            hostmap[port.address] = {"compute_name": node.name,
                                     "baremetal_name": bm_node.name}
    if hostmap == {}:
        return None
    else:
        return hostmap


something like:

def generate_hostmap(baremetal_client, compute_client):
    """Create a map between Compute nodes and Baremetal nodes"""
    hostmap = {}
    for node in compute_client.servers.list():
        try:
            bm_node = baremetal_client.node.get_by_instance_uuid(node.id)
            for port in baremetal_client.port.list(node=bm_node.uuid):
                hostmap[port.address] = {"compute_name": node.name,
                                         "baremetal_name": bm_node.name}
        except:
            # we didn't find a bm_node corresponding to the instance
            # probably server is in error state with no corresponding
            # ironic node assigned.
            return

can you maybe give https://review.opendev.org/#/c/738768/ a try?

Comment 2 Matt Flusche 2020-07-01 14:01:51 UTC
Hi Luca,

Thanks for looking at this.  Your fix seems to work for me.

(undercloud) [stack@undercloud13 ~]$ rm fencing.yaml 
(undercloud) [stack@undercloud13 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| 099a1f89-0130-4174-9252-db6b7e748948 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=172.16.6.110 |
| 2255e1b9-407a-4bdc-8edc-56c072763c49 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=172.16.6.101 |
| 10c1d7c1-416f-4a0f-b076-f15fc2376d60 | overcloud-compute-1     | ACTIVE | -          | Running     | ctlplane=172.16.6.102 |
| cc1f99cc-ee9f-4240-8053-b7e134a059c8 | overcloud-compute-2     | ERROR  | -          | NOSTATE     |                       |
| 7824d837-b932-4e33-904c-54925b00d3d4 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=172.16.6.103 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
(undercloud) [stack@undercloud13 ~]$ openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml instack.json    
(nil)
      
(undercloud) [stack@undercloud13 ~]$ cat fencing.yaml 
parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
    - agent: fence_ipmilan
      host_mac: 52:54:00:a5:a6:e0
      params:
        ipaddr: 192.168.122.1
        ipport: '634'
        lanplus: true
        login: admin
        passwd: redhat
        privlvl: administrator
[...]

Comment 3 Luca Miccini 2020-07-09 06:10:29 UTC
will fix this in train/osp16.

Comment 11 errata-xmlrpc 2020-10-28 15:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284


Note You need to log in before you can comment on or make changes to this bug.