1576674 – EC pool pgs are getting into incomplete state after killing "M" number of OSDs.

Bug 1576674 - EC pool pgs are getting into incomplete state after killing "M" number of OSDs.

Summary: EC pool pgs are getting into incomplete state after killing "M" number of OSDs.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	5.0
Assignee:	Josh Durgin
QA Contact:	Pawan
Docs Contact:	Ranjini M N
URL:
Whiteboard:
Depends On:
Blocks:	1959686
TreeView+	depends on / blocked

Reported:	2018-05-10 06:36 UTC by Parikshith
Modified:	2021-08-30 08:23 UTC (History)
CC List:	12 users (show)
Fixed In Version:	ceph-16.0.0-8633.el8cp
Doc Type:	Enhancement
Doc Text:	.The {storage-product} recovers with fewer OSDs available in an erasure coded (EC) pool Previously, erasure coded (EC) pools of size `k+m` required at least `k+1` copies for recovery to function. If only `k` copies were available, recovery would be incomplete. With this release, {storage-product} cluster now recovers with `k` or more copies available in an EC pool. For more information on erasure coded pools, see the link:{storage-strategies-guide}#erasure_code_pools[_Erasure coded pools_] chapter in the _{storage-product} Storage Strategies Guide_.
Clone Of:
Environment:
Last Closed:	2021-08-30 08:22:53 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Crush map (8.65 KB, text/plain) 2018-05-10 06:36 UTC, Parikshith	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph pull 17619	'None'	closed	osd: allow EC PGs to do recovery below min_size	2021-02-16 04:25:02 UTC
Red Hat Issue Tracker	RHCEPH-798	None	None	None	2021-08-19 16:42:27 UTC
Red Hat Product Errata	RHBA-2021:3294	None	None	None	2021-08-30 08:23:38 UTC

Description Parikshith 2018-05-10 06:36:53 UTC

Created attachment 1434216 [details]
Crush map

Description of problem:
EC pool pgs are getting into incomplete state after killing "M" number of OSDs.

Version-Release number of selected component (if applicable):
ceph version 12.2.4-10.el7cp

My Setup:
3 Mons, 3 OSD host with 8 osds in total
Ec Pool: K=3 M=2, osd level failure domain

Steps done:
1.Configured a ceph cluster 
2.Created an EC pool(5Pgs)(k=3,M=2) configured with osd level failure domain.

Profile: sudo ceph osd erasure-code-profile get myprofile --cluster slave
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Crush rule dump of this pool: sudo ceph osd crush rule dump ecpool --cluster slave
{
    "rule_id": 1,
    "rule_name": "ecpool",
    "ruleset": 1,
    "type": 3,
    "min_size": 3,
    "max_size": 5,
    "steps": [
        {
            "op": "set_chooseleaf_tries",
            "num": 5
        },
        {
            "op": "set_choose_tries",
            "num": 100
        },
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "choose_indep",
            "num": 0,
            "type": "osd"
        },
        {
            "op": "emit"
        }
    ]
}

3. Killed "M"(2) number of OSDs.

Actual results:
After killing 2 OSDs some of the Pgs of this ecpool got into incomplete state.

sudo ceph pg dump --cluster slave | grep "^12."
dumped all
12.4          0                  0        0         0       0          0    0        0          active+undersized 2018-05-09 15:09:52.831363        0'0       502:19    [NONE,2,1,3,5]          2    [NONE,2,1,3,5]              2        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.0          0                  0        0         0       0          0    0        0          active+undersized 2018-05-09 15:09:52.835938        0'0       502:30    [0,3,1,2,NONE]          0    [0,3,1,2,NONE]              0        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.1          0                  0        0         0       0          0    0        0               active+clean 2018-05-09 15:07:39.630869        0'0       502:19       [3,1,5,2,0]          3       [3,1,5,2,0]              3        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.2          0                  0        0         0       0          0    0        0                 incomplete 2018-05-09 15:09:57.773180        0'0       502:27 [NONE,2,0,3,NONE]          2 [NONE,2,0,3,NONE]              2        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.3          0                  0        0         0       0          0    0        0                 incomplete 2018-05-09 15:09:57.771912        0'0       502:27 [NONE,3,1,NONE,5]          3 [NONE,3,1,NONE,5]              3        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12     0 0     0 0 0           0      0      0 


Expected results:
None of pgs should get into incomplete state since k=3 and m=2 and maximum 2 osds can go down at OSD level failure domain.

Additional info:
By default this ecpool was created with min_size of '4'.
sudo ceph osd pool get ecpool min_size --cluster slave
min_size: 4
I am not sure whether this is applicable to erasure coded pools but by manually reducing min_size to '3', incomplete pgs were cleared.


I have attached crush map of my cluster.

Comment 10 Giridhar Ramaraju 2019-08-05 13:11:09 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 11 Giridhar Ramaraju 2019-08-05 13:12:10 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 13 Josh Durgin 2020-04-22 18:32:13 UTC

This is in all 5.0 builds - needs qa ack

Comment 22 errata-xmlrpc 2021-08-30 08:22:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294

Note You need to log in before you can comment on or make changes to this bug.