Bug 1851507

Summary: "openstack overcloud generate fencing [...]" fails with 404 Not Found if overcloud node is error state (nova) and not yet deployed with ironic
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: openstack-tripleo-commonAssignee: Luca Miccini <lmiccini>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: medium Docs Contact:
Priority: medium    
Version: 16.0 (Train)CC: aschultz, dabarzil, lmiccini, mburns, slinaber
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-11.4.1-0.20200708213416.da384ef.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-28 15:38:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Flusche 2020-06-26 18:11:49 UTC
Description of problem:

Example failure in a lab:

(undercloud) [stack@undercloud13 ~]$ openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml instack.json       
Action tripleo.parameters.generate_fencing execution failed: Failed to run action [action_ex_id=None, action_cls='<class 'mistral.actions.action_factory.GenerateFencingParametersAction'>', attributes='{}', params='{u'ipmi_level': u'administrator', u'ipmi_cipher': None, u'ipmi_lanplus': True, u'delay': None, u'os_auth': None, u'nodes_json': [{u'pm_password': u'redhat', u'name': u'overcloud13-node1', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:a5:a6:e0'], u'pm_port': u'634', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node2', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bb:f9:0f'], u'pm_port': u'635', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node3', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:75:02:e7'], u'pm_port': u'636', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node4', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:d9:f7'], u'pm_port': u'637', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node5', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:0f:4c:7c'], u'pm_port': u'638', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node6', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:94:3b:fc'], u'pm_port': u'639', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node7', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:84:e3:6b'], u'pm_port': u'640', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node8', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:b8:5c'], u'pm_port': u'641', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph1', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:07:63:c8'], u'pm_port': u'642', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph2', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:37:d5:00'], u'pm_port': u'643', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph3', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:5d:46:76'], u'pm_port': u'644', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}]}']
 Not Found (HTTP 404)



For this situation.

Overcloud deployment failed during a scale up, example:

2020-06-26 17:53:55Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.Compute: Resource CREATE failed: ResourceInError: resources[2].resources.N
ovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"

 Stack overcloud UPDATE_FAILED 

overcloud.Compute.2.NovaCompute:
  resource_type: OS::TripleO::ComputeServer
  physical_resource_id: cc1f99cc-ee9f-4240-8053-b7e134a059c8
  status: CREATE_FAILED
  status_reason: |
    ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
Heat Stack update failed.


(undercloud) [stack@undercloud13 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| 099a1f89-0130-4174-9252-db6b7e748948 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=172.16.6.110 |
| 2255e1b9-407a-4bdc-8edc-56c072763c49 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=172.16.6.101 |
| 10c1d7c1-416f-4a0f-b076-f15fc2376d60 | overcloud-compute-1     | ACTIVE | -          | Running     | ctlplane=172.16.6.102 |
| cc1f99cc-ee9f-4240-8053-b7e134a059c8 | overcloud-compute-2     | ERROR  | -          | NOSTATE     |                       |
| 7824d837-b932-4e33-904c-54925b00d3d4 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=172.16.6.103 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+


No ironic node:

(undercloud) [stack@undercloud13 ~]$ openstack baremetal node list |grep cc1f99cc-ee9f-4240-8053-b7e134a059c8
(nil)

From mistral log:

2020-06-26 13:56:59.134 1702 DEBUG ironicclient.common.http [req-092a23aa-f897-4147-9fe5-2b0c203693c7 c03d8da0d553401ba7d64538c94d8bda b3d0c809dd634964b39491c769004753 - default default] curl -i -X GET -H 'X-OpenStack-Ironic-API-Version: 1.36' -H 'X-Auth-Token: {SHA1}e39ea52d4433f9a6fcbcb28de2c972ece6bca3d5' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'User-Agent: python-ironicclient' http://172.16.6.1:6385/v1/nodes/detail?instance_uuid=cc1f99cc-ee9f-4240-8053-b7e134a059c8 log_curl_request /usr/lib/python2.7/site-packages/ironicclient/common/http.py:337
2020-06-26 13:56:59.204 1702 DEBUG ironicclient.common.http [req-092a23aa-f897-4147-9fe5-2b0c203693c7 c03d8da0d553401ba7d64538c94d8bda b3d0c809dd634964b39491c769004753 - default default]
HTTP/1.1 200 OK
Date: Fri, 26 Jun 2020 17:56:59 GMT
Server: Apache
X-OpenStack-Ironic-API-Minimum-Version: 1.1
X-OpenStack-Ironic-API-Maximum-Version: 1.38
X-OpenStack-Ironic-API-Version: 1.36
Openstack-Request-Id: req-daa48895-2e6a-4ebd-9bc0-6702844f1289
Content-Length: 13
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/json

{"nodes": []}
 log_http_response /usr/lib/python2.7/site-packages/ironicclient/common/http.py:351
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor [req-092a23aa-f897-4147-9fe5-2b0c203693c7 c03d8da0d553401ba7d64538c94d8bda b3d0c809dd634964b39491c769004753 - default default] Failed to run action [action_ex_id=None, action_cls='<class 'mistral.actions.action_factory.GenerateFencingParametersAction'>', attributes='{}', params='{u'ipmi_level': u'administrator', u'ipmi_cipher': None, u'ipmi_lanplus': True, u'delay': None, u'os_auth': None, u'nodes_json': [{u'pm_password': u'redhat', u'name': u'overcloud13-node1', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:a5:a6:e0'], u'pm_port': u'634', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node2', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bb:f9:0f'], u'pm_port': u'635', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node3', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:75:02:e7'], u'pm_port': u'636', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node4', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:d9:f7'], u'pm_port': u'637', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node5', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:0f:4c:7c'], u'pm_port': u'638', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node6', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:94:3b:fc'], u'pm_port': u'639', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node7', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:84:e3:6b'], u'pm_port': u'640', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-node8', u'memory': u'8192', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:bc:b8:5c'], u'pm_port': u'641', u'pm_type': u'pxe_ipmitool', u'disk': u'42', u'arch': u'x86_64', u'cpu': u'2', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph1', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:07:63:c8'], u'pm_port': u'642', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph2', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:37:d5:00'], u'pm_port': u'643', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}, {u'pm_password': u'redhat', u'name': u'overcloud13-ceph3', u'memory': u'4096', u'pm_addr': u'192.168.122.1', u'mac': [u'52:54:00:5d:46:76'], u'pm_port': u'644', u'pm_type': u'pxe_ipmitool', u'disk': u'20', u'arch': u'x86_64', u'cpu': u'1', u'pm_user': u'admin'}]}']
 Not Found (HTTP 404): NotFound: Not Found (HTTP 404)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor Traceback (most recent call last):
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     result = action.run(action_ctx)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/tripleo_common/actions/parameters.py", line 361, in run
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     self.get_compute_client(context))
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/tripleo_common/utils/nodes.py", line 694, in generate_hostmap
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     bm_node = baremetal_client.node.get_by_instance_uuid(node.id)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/ironicclient/v1/node.py", line 329, in get_by_instance_uuid
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor     raise exc.NotFound()
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor NotFound: Not Found (HTTP 404)
2020-06-26 13:56:59.205 1702 ERROR mistral.executors.default_executor




Version-Release number of selected component (if applicable):

(undercloud) [stack@undercloud13 ~]$ rpm -q openstack-tripleo-common
openstack-tripleo-common-8.7.1-20.el7ost.noarch (current)


How reproducible:
100%

Steps to Reproduce:
1. see example above
2.
3.

It seems that warning about the missing node and skipping would be more correct in this situation

Comment 1 Luca Miccini 2020-07-01 09:23:42 UTC
I think the exception should be handled in the generate_hostmap function:


class GenerateFencingParametersAction(base.TripleOAction):
    """Generates fencing configuration for a deployment.
...

    def run(self, context):
        """Returns the parameters for fencing controller nodes"""
        hostmap = nodes.generate_hostmap(self.get_baremetal_client(context),
                                         self.get_compute_client(context))
        fence_params = {"EnableFencing": True, "FencingConfig": {}}
        devices = []
...



def generate_hostmap(baremetal_client, compute_client):
    """Create a map between Compute nodes and Baremetal nodes"""
    hostmap = {}
    for node in compute_client.servers.list():
        bm_node = baremetal_client.node.get_by_instance_uuid(node.id)
        for port in baremetal_client.port.list(node=bm_node.uuid):
            hostmap[port.address] = {"compute_name": node.name,
                                     "baremetal_name": bm_node.name}
    if hostmap == {}:
        return None
    else:
        return hostmap


something like:

def generate_hostmap(baremetal_client, compute_client):
    """Create a map between Compute nodes and Baremetal nodes"""
    hostmap = {}
    for node in compute_client.servers.list():
        try:
            bm_node = baremetal_client.node.get_by_instance_uuid(node.id)
            for port in baremetal_client.port.list(node=bm_node.uuid):
                hostmap[port.address] = {"compute_name": node.name,
                                         "baremetal_name": bm_node.name}
        except:
            # we didn't find a bm_node corresponding to the instance
            # probably server is in error state with no corresponding
            # ironic node assigned.
            return

can you maybe give https://review.opendev.org/#/c/738768/ a try?

Comment 2 Matt Flusche 2020-07-01 14:01:51 UTC
Hi Luca,

Thanks for looking at this.  Your fix seems to work for me.

(undercloud) [stack@undercloud13 ~]$ rm fencing.yaml 
(undercloud) [stack@undercloud13 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| 099a1f89-0130-4174-9252-db6b7e748948 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=172.16.6.110 |
| 2255e1b9-407a-4bdc-8edc-56c072763c49 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=172.16.6.101 |
| 10c1d7c1-416f-4a0f-b076-f15fc2376d60 | overcloud-compute-1     | ACTIVE | -          | Running     | ctlplane=172.16.6.102 |
| cc1f99cc-ee9f-4240-8053-b7e134a059c8 | overcloud-compute-2     | ERROR  | -          | NOSTATE     |                       |
| 7824d837-b932-4e33-904c-54925b00d3d4 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=172.16.6.103 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
(undercloud) [stack@undercloud13 ~]$ openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml instack.json    
(nil)
      
(undercloud) [stack@undercloud13 ~]$ cat fencing.yaml 
parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
    - agent: fence_ipmilan
      host_mac: 52:54:00:a5:a6:e0
      params:
        ipaddr: 192.168.122.1
        ipport: '634'
        lanplus: true
        login: admin
        passwd: redhat
        privlvl: administrator
[...]

Comment 3 Luca Miccini 2020-07-09 06:10:29 UTC
will fix this in train/osp16.

Comment 11 errata-xmlrpc 2020-10-28 15:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284