Bug 1967964 - Upgrade to RHCS4 can leave the cluster in HEALTH_WARN when firefly tunables are in use
Summary: Upgrade to RHCS4 can leave the cluster in HEALTH_WARN when firefly tunables a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2z3
Assignee: Guillaume Abrioux
QA Contact: Ameena Suhani S H
URL:
Whiteboard:
Depends On:
Blocks: 1760354
TreeView+ depends on / blocked
 
Reported: 2021-06-04 14:37 UTC by Giulio Fidente
Modified: 2023-09-15 01:09 UTC (History)
11 users (show)

Fixed In Version: ceph-ansible-4.0.61-1.el8cp, ceph-ansible-4.0.61-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-27 18:26:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 6689 0 None closed [skip ci] update: convert straw bucket 2021-07-09 06:39:59 UTC
Red Hat Issue Tracker RHCEPH-293 0 None None None 2021-08-26 16:43:44 UTC
Red Hat Product Errata RHBA-2021:3670 0 None None None 2021-09-27 18:26:47 UTC

Description Giulio Fidente 2021-06-04 14:37:44 UTC
It seems there is the possibility that after the upgrade to RHCS4 the cluster remains in HEALTH_WARN state with:

  crush map has legacy tunables (require firefly, min is hammer)

This is because in nautilus the default value for mon_crush_min_required_version has been changed from firefly to hammer, which means the cluster will issue a health warning if the CRUSH tunables are older than hammer. The upstream guide [1] says thas:

"""
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:

ceph config set mon mon_crush_min_required_version firefly

If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:

ceph osd getcrushmap -o backup-crushmap
ceph osd crush set-all-straw-buckets-to-straw2

If there are problems, you can easily revert with:

ceph osd setcrushmap -i backup-crushmap

Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.
"""

There is generally a small (but non-zero) amount of data that will move around by making the switch to hammer tunables; for more information, see Tunables.

1. https://docs.ceph.com/en/latest/releases/nautilus/#v14-2-1-nautilus

Comment 1 RHEL Program Management 2021-06-04 14:37:53 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 9 Veera Raghava Reddy 2021-08-26 16:43:11 UTC
Hi Alfredo, Can you help with recreation steps, verification of this Bug?

Comment 10 Alfredo 2021-08-26 17:49:36 UTC
Hi Veera! Unfortunately I'm not sure what the steps are to recreate this issue. In addition, at the ceph squad we have our hands (and machines) full testing ceph 5 and fixing up our CI. Sorry I can't be of more help. 

@gfidente, could you please provide the steps needed to reproduce this issue?

Comment 14 errata-xmlrpc 2021-09-27 18:26:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670

Comment 15 Red Hat Bugzilla 2023-09-15 01:09:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.