Bug 1927128

Summary:	[Tracker for BZ #1937088] When Performed add capacity over arbiter mode cluster ceph health reports PG_AVAILABILITY Reduced data availability: 25 pgs inactive, 25 pgs incomplete
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Pratik Surve <prsurve>
Component:	ceph	Assignee:	Greg Farnum <gfarnum>
Status:	CLOSED ERRATA	QA Contact:	Pratik Surve <prsurve>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.7	CC:	bniver, ebenahar, gfarnum, madam, muagarwa, nberry, nojha, ocs-bugs, owasserm
Target Milestone:	---	Keywords:	AutomationBackLog
Target Release:	OCS 4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	1937088 (view as bug list)		Environment:
Last Closed:	2021-05-19 09:19:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1937088
Bug Blocks:

Description Pratik Surve 2021-02-10 07:02:58 UTC

Description of problem (please be detailed as possible and provide log
snippests):

When Performed add capacity over arbiter mode cluster ceph health reports PG_AVAILABILITY Reduced data availability: 25 pgs inactive, 25 pgs incomplete

Version of all relevant components (if applicable):
OCP version:- 4.7.0-0.nightly-2021-02-06-084550
OCS version:- ocs-operator.v4.7.0-254.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP cluster with 3M and 6W
2. Deploy OCS with arbiter mode enable
3. Perform Add Capacity


Actual results:

# ceph health  detail
HEALTH_WARN Reduced data availability: 25 pgs inactive, 25 pgs incomplete
PG_AVAILABILITY Reduced data availability: 25 pgs inactive, 25 pgs incomplete
    pg 1.5 is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.9 is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.c is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.13 is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.14 is remapped+incomplete, acting [2,4] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.16 is remapped+incomplete, acting [2,4] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.17 is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.1c is remapped+incomplete, acting [1,5] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.20 is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.29 is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.2b is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.2e is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.33 is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.3b is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.40 is remapped+incomplete, acting [2,4] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.48 is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.4c is remapped+incomplete, acting [2,4] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.4e is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.58 is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.5c is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.5f is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.61 is remapped+incomplete, acting [3,0] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.6e is remapped+incomplete, acting [1,5] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.79 is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 1.7d is remapped+incomplete, acting [0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')


Expected results:
After add capacity ceph health should be HEALTH_OK

Additional info:

PG's are in the same state for around 12 hr or more

# ceph -s
  cluster:
    id:     93c1ab56-3d0b-4ee4-9b0e-c059e9809643
    health: HEALTH_WARN
            Reduced data availability: 25 pgs inactive, 25 pgs incomplete
 
  services:
    mon: 5 daemons, quorum a,b,c,d,e (age 10h)
    mgr: a(active, since 22h)
    osd: 8 osds: 8 up (since 17h), 8 in (since 17h); 25 remapped pgs
 
  data:
    pools:   1 pools, 128 pgs
    objects: 7.55k objects, 29 GiB
    usage:   121 GiB used, 1.4 TiB / 1.6 TiB avail
    pgs:     19.531% pgs not active
             103 active+clean
             25  remapped+incomplete

Comment 2 Elad 2021-02-10 07:55:12 UTC

Since this is about a basic functionality of cluster expansion while the cluster is deployed as an arbiter mode

Comment 4 Travis Nielsen 2021-02-11 17:16:58 UTC

I see the OSDs are distributed evenly in the CRUSH tree:

ID  CLASS WEIGHT  TYPE NAME              STATUS REWEIGHT PRI-AFF 
 -1       1.56238 root default                                   
 -4       0.78119     zone a                                     
 -3       0.39059         host compute-0                         
  0   hdd 0.19530             osd.0          up  1.00000 1.00000 
  6   hdd 0.19530             osd.6          up  1.00000 1.00000 
-13       0.39059         host compute-3                         
  3   hdd 0.19530             osd.3          up  1.00000 1.00000 
  7   hdd 0.19530             osd.7          up  1.00000 1.00000 
 -8       0.78119     zone b                                     
-11       0.39059         host compute-1                         
  2   hdd 0.19530             osd.2          up  1.00000 1.00000 
  5   hdd 0.19530             osd.5          up  1.00000 1.00000 
 -7       0.39059         host compute-2                         
  1   hdd 0.19530             osd.1          up  1.00000 1.00000 
  4   hdd 0.19530             osd.4          up  1.00000 1.00000 

For other ceph info, see related output from the cluster:

http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/bz_1927128/feb_10/must-gather.local.5824928173416610658/quay-io-rhceph-dev-ocs-must-gather-sha256-8099d74217f9305c717cb1a157a6a89f5e810834edd9dfd80b89484263e6cc62/ceph/must_gather_commands/

@Neha What could be the cause of the "remapped+incomplete" PG status?

Comment 6 Travis Nielsen 2021-02-13 23:10:17 UTC

Greg, can you take a look? thanks!

Comment 7 Greg Farnum 2021-02-17 06:18:47 UTC

I'm trying to reproduce this as I'm going to need some real debug logs to understand what's happening here. I assume from what I see at https://github.com/red-hat-storage/ocs-ci/blob/master/ocs_ci/ocs/resources/storage_cluster.py#L430 that this add capacity test is adding a new OSD drive to each of the 4 hosts? (Also, if it's easy to just run that with "debug osd = 20" that would help!)


Looking at the history of pg 1.1c displayed in OSD 2's logs, I'm seeing it assigned to
up [1,2,3,0]; acting [1,2,3,0] -- ie, all the original OSDs
up [1,5,3,6]; acting [1,5,3,6] -- so, we've added new OSDs and consequently remapped; looks fine
up [1,5,7,6]; acting [1,3] -- so, more new OSDs; as there's not enough overlap from old to new calculated up set, we're going active with only two nodes that already have the data (the others will be getting backfilled)
up [1,5,7,6]; acting [1,5] -- and this is the confusing one! 1 and 5 are in the same zone so it's not allowed to go active with these, since the other zone survives. They are conspicuously the first two members of the up set, but I'm just not seeing in the source how this happens

Comment 8 Mudit Agarwal 2021-03-01 13:34:25 UTC

Is this reproducible else we can remove the blocker flag?

Comment 9 Greg Farnum 2021-03-04 03:07:34 UTC

I've finally managed to reproduce this.

It's definitely a blocker for the stretch cluster feature -- we can't ship with this kind of peering issue; it breaks data availability which is rather opposite the goal of the feature.

I've identified the issue and the fix isn't too complicated, so I'm working on that and doing an audit for this category of error elsewhere.

Comment 12 Mudit Agarwal 2021-03-09 14:14:28 UTC

Greg, do we have some update on the fix?

Comment 13 Greg Farnum 2021-03-11 10:55:08 UTC

Yeah I have a branch which fixed the issue I identified, but that's revealed a few knock-on issues I'm dealing with. I expect to have an upstream PR today (Thursday) during Pacific time business hours.

Comment 14 Greg Farnum 2021-03-15 08:13:27 UTC

Branch approved upstream (https://github.com/ceph/ceph/pull/40049) and passed rados suite run; will do back port and push into RHCS 4.2 branch once I've slept and the workweek starts!

Comment 15 Greg Farnum 2021-03-15 20:16:27 UTC

Merged into our ceph-4.2-rhel-patches branch now.

Comment 21 errata-xmlrpc 2021-05-19 09:19:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041