Bug 1729580

Summary: Pool stats issue with upgrades to nautilus
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: DocumentationAssignee: John Brier <jbrier>
Status: CLOSED CURRENTRELEASE QA Contact: Tejas <tchandra>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: asriram, assingh, ceph-eng-bugs, dzafman, gsitlani, jbrier, kchai, kdreyer, mburrows, mmuench, nojha, pnataraj, raj2428, twilkins, ukurundw, vereddy
Target Milestone: z2   
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-17 22:17:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1750994, 1859104    

Description Vikhyat Umrao 2019-07-12 17:37:47 UTC
Description of problem:
Backport https://github.com/ceph/ceph/pull/28978 if RHCS 4.0 will not have 14.2.3


[ceph-users] Pool stats issue with upgrades to nautilus
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html

Comment 1 Vikhyat Umrao 2019-07-19 20:22:09 UTC
This is fixed in 14.2.2. 

https://github.com/ceph/ceph/pull/29032

But we still want to test and document it in release notes. Upstream release notes:

https://github.com/ceph/ceph/pull/29011/files#diff-662a7de9ea1019d168ac10d9a198c4c9R25

* Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where
  deploying a single new (Nautilus) BlueStore OSD on an upgraded
  cluster (i.e. one that was originally deployed pre-Nautilus) breaks
  the pool utilization stats reported by ``ceph df``.  Until all OSDs
  have been reprovisioned or updated (via ``ceph-bluestore-tool
  repair``), the pool stats will show values that are lower than the
  true value.  This is resolved in 14.2.2, such that the cluster only
  switches to using the more accurate per-pool stats after *all* OSDs
  are 14.2.2 (or later), are BlueStore, and (if they were created
  prior to Nautilus) have been updated via the ``repair`` function.


There is some discussion in upstream once clusters will be upgraded from Luminous/mimic BlueStore to Nautilus for us RHCS 3.x to 4.x. They will have something like this reporting:

            Legacy BlueStore stats reporting detected on N OSD(s)

We want to document this also.

upstream ceph-users discussion:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036010.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036002.html

Comment 2 Vikhyat Umrao 2019-07-30 00:46:18 UTC
Doc Team - I am changing this one to documentation as we need to document this in upgrade notes and release notes.

Comment 6 Preethi 2019-12-20 10:21:40 UTC
Was able to see this issue when upgraded from Luminous Bluestore RHCS 3.3 to Nautilus RHCS4.0 Bluestore.

Below error details:

           id:     bb998057-5b58-417c-9c49-a53fcf949dd5
    health: HEALTH_WARN
            Legacy BlueStore stats reporting detected on 12 OSD(s)
            application not enabled on 1 pool(s)

Comment 15 Gnanaraj Thomas 2020-03-05 08:38:48 UTC
We have upgraded cluster to "ceph version 14.2.4 nautilus (stable)".

Upgraded from Luminous Bluestore RHCS 3.3 to Nautilus RHCS4.0 Bluestore.

Still, we could see the same issue.

health: HEALTH_WARN
        Legacy BlueStore stats reporting detected on 16 OSD(s)

Comment 17 Veera Raghava Reddy 2020-09-10 18:58:21 UTC
Hi John Brier,

*******
When upgrading from Red Hat Ceph Storage 3.x to Red Hat Ceph Storage 4, until all OSDs are upgraded you may see the following HEALTH_WARN message: Legacy BlueStore stats reporting detected on N OSD(s). The health will return to normal after all OSDs are upgraded.
*******

From comments 3 & 4, looks like the message should be after all the OSS are recovered/repaired and steps to repair are missing

Comment 20 Veera Raghava Reddy 2020-09-17 20:58:53 UTC
hi John,
Here upgrade and updated via repair are two different actions.

The Health Warning is showing after upgrade. The fix is to repair.
Note required in the document is to make aware regarding the Health Warning reason and provide info for Repair.