Bug 1952571 - [GSS][ceph-ansible][RFE] Additional pre-check for mon quorum failures while running rolling_update.yml playbook
Summary: [GSS][ceph-ansible][RFE] Additional pre-check for mon quorum failures while r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.3
Assignee: Guillaume Abrioux
QA Contact: Ameena Suhani S H
Ranjini M N
URL:
Whiteboard:
Depends On:
Blocks: 2031070
TreeView+ depends on / blocked
 
Reported: 2021-04-22 14:55 UTC by Geo Jose
Modified: 2025-05-28 09:44 UTC (History)
12 users (show)

Fixed In Version: ceph-ansible-4.0.63-1.el8cp, ceph-ansible-4.0.63-1.el7cp
Doc Type: Enhancement
Doc Text:
.`ceph-ansible` checks for the Ceph Monitor quorum before starting the upgrade Previously, when the storage cluster was in a HEALTH ERR or HEALTH WARN state due to one of the Ceph monitors being down, the `rolling_upgrade.yml` playbook would run. However, the upgrade would fail and the quorum was lost resulting in I/O down or a cluster failure. With this release, an additional condition occurs where `ceph-ansible` checks the Ceph Monitor quorum before starting the upgrade.
Clone Of:
Environment:
Last Closed: 2022-05-05 07:53:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 6704 0 None Merged [skip ci] rolling_update: check quorum state before upgrade 2021-11-16 09:27:37 UTC
Red Hat Issue Tracker RHCEPH-1335 0 None None None 2021-08-30 06:01:21 UTC
Red Hat Product Errata RHSA-2022:1716 0 None None None 2022-05-05 07:53:39 UTC

Description Geo Jose 2021-04-22 14:55:41 UTC
Description of problem:
While running rolling_update.yml, the playbook will fail if the cluster isn't in an acceptable state(HEALTH_ERR). The playbook will run even if in HEALTH_WARN(let's assume 1/3 mons down). But while running this playbook, if the upgrade fails for one of the mon, we loose the quorum resulting in IO down/Cluster failure. So to avoid this situation, it would be good if we can add the below conditions/anything similar conditions:
 - Add another condition to check the running mons before starting the mon upgrade.
 - if we add the above condition, we should give an option to overide the situation where the system admin is okay to proceed with upgrading 2 mons(with minimum number of quorum)

Version-Release number of selected component (if applicable):
 * RHCS 4.2


Additional info:

 o Due to the below condition, it is not checking whether all the monitors are up and running:
---
    - name: set mon_host_count
      set_fact:
        mon_host_count: "{{ groups[mon_group_name] | length }}"

    - name: fail when less than three monitors
      fail:
        msg: "Upgrade of cluster with less than three monitors is not supported."
      when: mon_host_count | int < 3
---

 o The below condition will skip since the cluster not in 'HEALTH_ERR'(1/3 mons down)
---
          - name: fail if cluster isn't in an acceptable state
            fail:
              msg: "cluster is not in an acceptable state!"
        when: (check_cluster_health.stdout | from_json).status == 'HEALTH_ERR'
    when: inventory_hostname == groups[mon_group_name] | first
---

Comment 9 errata-xmlrpc 2022-05-05 07:53:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 4.3 Security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1716

Comment 10 Ketrina Foster 2023-02-15 09:08:40 UTC Comment hidden (spam)
Comment 11 damon eddleman 2025-05-28 09:44:19 UTC
I’ll walk you through why your cat’s meowing isn’t just random noise—it’s their way of talking to you. Cats use meows as their primary mode of feline communication, shaped by specific needs or situations. Understanding these reasons strengthens your bond and helps you respond better.

https://www.whycatmeows.com


Note You need to log in before you can comment on or make changes to this bug.