Bug 1967964

Summary: Upgrade to RHCS4 can leave the cluster in HEALTH_WARN when firefly tunables are in use
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Giulio Fidente <gfidente>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Ameena Suhani S H <amsyedha>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2CC: alfrgarc, aschoen, ceph-eng-bugs, fpantano, gabrioux, gmeno, nthomas, tserlin, vashastr, vereddy, ykaul
Target Milestone: ---   
Target Release: 4.2z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.61-1.el8cp, ceph-ansible-4.0.61-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-27 18:26:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1760354    

Description Giulio Fidente 2021-06-04 14:37:44 UTC
It seems there is the possibility that after the upgrade to RHCS4 the cluster remains in HEALTH_WARN state with:

  crush map has legacy tunables (require firefly, min is hammer)

This is because in nautilus the default value for mon_crush_min_required_version has been changed from firefly to hammer, which means the cluster will issue a health warning if the CRUSH tunables are older than hammer. The upstream guide [1] says thas:

"""
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:

ceph config set mon mon_crush_min_required_version firefly

If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:

ceph osd getcrushmap -o backup-crushmap
ceph osd crush set-all-straw-buckets-to-straw2

If there are problems, you can easily revert with:

ceph osd setcrushmap -i backup-crushmap

Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.
"""

There is generally a small (but non-zero) amount of data that will move around by making the switch to hammer tunables; for more information, see Tunables.

1. https://docs.ceph.com/en/latest/releases/nautilus/#v14-2-1-nautilus

Comment 1 RHEL Program Management 2021-06-04 14:37:53 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 9 Veera Raghava Reddy 2021-08-26 16:43:11 UTC
Hi Alfredo, Can you help with recreation steps, verification of this Bug?

Comment 10 Alfredo 2021-08-26 17:49:36 UTC
Hi Veera! Unfortunately I'm not sure what the steps are to recreate this issue. In addition, at the ceph squad we have our hands (and machines) full testing ceph 5 and fixing up our CI. Sorry I can't be of more help. 

@gfidente, could you please provide the steps needed to reproduce this issue?

Comment 14 errata-xmlrpc 2021-09-27 18:26:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670

Comment 15 Red Hat Bugzilla 2023-09-15 01:09:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days