Description of problem: While running rolling_update.yml, the playbook will fail if the cluster isn't in an acceptable state(HEALTH_ERR). The playbook will run even if in HEALTH_WARN(let's assume 1/3 mons down). But while running this playbook, if the upgrade fails for one of the mon, we loose the quorum resulting in IO down/Cluster failure. So to avoid this situation, it would be good if we can add the below conditions/anything similar conditions: - Add another condition to check the running mons before starting the mon upgrade. - if we add the above condition, we should give an option to overide the situation where the system admin is okay to proceed with upgrading 2 mons(with minimum number of quorum) Version-Release number of selected component (if applicable): * RHCS 4.2 Additional info: o Due to the below condition, it is not checking whether all the monitors are up and running: --- - name: set mon_host_count set_fact: mon_host_count: "{{ groups[mon_group_name] | length }}" - name: fail when less than three monitors fail: msg: "Upgrade of cluster with less than three monitors is not supported." when: mon_host_count | int < 3 --- o The below condition will skip since the cluster not in 'HEALTH_ERR'(1/3 mons down) --- - name: fail if cluster isn't in an acceptable state fail: msg: "cluster is not in an acceptable state!" when: (check_cluster_health.stdout | from_json).status == 'HEALTH_ERR' when: inventory_hostname == groups[mon_group_name] | first ---
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 4.3 Security and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1716
branch 3.2, rolling_upgrade.yml sets flags nout and norebalance, which is fine, however, after upgrading one osd it checks for clean pgs and fails. PGs are not going to be clean with the flags still set. Especially if any I/O occurred to the PG when the OSD was upgraded and restarted https://www.runyourpool.net/
I’ll walk you through why your cat’s meowing isn’t just random noise—it’s their way of talking to you. Cats use meows as their primary mode of feline communication, shaped by specific needs or situations. Understanding these reasons strengthens your bond and helps you respond better. https://www.whycatmeows.com