Bug 1975271 - Minor update does not restart ha resource when it is in failed stated
Summary: Minor update does not restart ha resource when it is in failed stated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: z2
: 16.2 (Train on RHEL 8.4)
Assignee: OSP Team
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-23 11:23 UTC by Damien Ciabrini
Modified: 2022-03-23 22:28 UTC (History)
3 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20220116004909.64b2e88.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-23 22:28:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 795704 0 None MERGED HA minor update: fix bad pcs invocation 2021-06-23 11:23:51 UTC
Red Hat Issue Tracker OSP-5410 0 None None None 2022-01-26 07:59:41 UTC
Red Hat Product Errata RHSA-2022:0995 0 None None None 2022-03-23 22:28:57 UTC

Description Damien Ciabrini 2021-06-23 11:23:52 UTC
Description of problem:
During minor update of a HA controller, we have a script called pacemaker_restart_bundle.sh that is in charge of restarting the local instance of an HA resource after some configuration change.

The way the resource is restarted depends on the type of the resource and its current state in the cluster.

When the resource is in failed state, the script invokes pcs with an invalid command line, which in turns logs an error and does not restart the resource as expected, e.g.:

    "stdout: Wed Jun  9 16:45:29 UTC 2021: openstack-cinder-volume is currently not running on 'controller-0', cleaning up its state to restart it if necessary",
        "",
        "stderr: Error: Specified option '--node' is not supported in this command"
    ]
}


Version-Release number of selected component (if applicable):
16.2

How reproducible:
When a resource a in failed state on a node

Steps to Reproduce:
1. make a clone resource fail to start on a controller, this will block it and leaves it in failed state.
2. perform a minor upgrade on that node


Actual results:
The resource stays in failed state

Expected results:
The resource should have been given a chance to restart

Additional info:

Comment 13 errata-xmlrpc 2022-03-23 22:28:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenStack Platform 16.2 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0995


Note You need to log in before you can comment on or make changes to this bug.