Bug 1576674 - EC pool pgs are getting into incomplete state after killing "M" number of OSDs.
Summary: EC pool pgs are getting into incomplete state after killing "M" number of OSDs.
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 5.0
Assignee: Josh Durgin
QA Contact: Pawan
Ranjini M N
URL:
Whiteboard:
Depends On:
Blocks: 1959686
TreeView+ depends on / blocked
 
Reported: 2018-05-10 06:36 UTC by Parikshith
Modified: 2021-06-29 07:42 UTC (History)
13 users (show)

Fixed In Version: ceph-16.0.0-8633.el8cp
Doc Type: Enhancement
Doc Text:
.The {storage-product} recovers with fewer OSDs available in an erasure coded (EC) pool Previously, erasure coded (EC) pools of size `k+m` required at least `k+1` copies for recovery to function. If only `k` copies were available, recovery would be incomplete. With this release, {storage-product} cluster now recovers with `k` or more copies available in an EC pool. For more information on erasure coded pools, see the link:{storage-strategies-guide}#erasure_code_pools[_Erasure coded pools_] chapter in the _{storage-product} Storage Strategies Guide_.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
Crush map (8.65 KB, text/plain)
2018-05-10 06:36 UTC, Parikshith
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 17619 0 'None' closed osd: allow EC PGs to do recovery below min_size 2021-02-16 04:25:02 UTC

Description Parikshith 2018-05-10 06:36:53 UTC
Created attachment 1434216 [details]
Crush map

Description of problem:
EC pool pgs are getting into incomplete state after killing "M" number of OSDs.

Version-Release number of selected component (if applicable):
ceph version 12.2.4-10.el7cp

My Setup:
3 Mons, 3 OSD host with 8 osds in total
Ec Pool: K=3 M=2, osd level failure domain

Steps done:
1.Configured a ceph cluster 
2.Created an EC pool(5Pgs)(k=3,M=2) configured with osd level failure domain.

Profile: sudo ceph osd erasure-code-profile get myprofile --cluster slave
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Crush rule dump of this pool: sudo ceph osd crush rule dump ecpool --cluster slave
{
    "rule_id": 1,
    "rule_name": "ecpool",
    "ruleset": 1,
    "type": 3,
    "min_size": 3,
    "max_size": 5,
    "steps": [
        {
            "op": "set_chooseleaf_tries",
            "num": 5
        },
        {
            "op": "set_choose_tries",
            "num": 100
        },
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "choose_indep",
            "num": 0,
            "type": "osd"
        },
        {
            "op": "emit"
        }
    ]
}

3. Killed "M"(2) number of OSDs.

Actual results:
After killing 2 OSDs some of the Pgs of this ecpool got into incomplete state.

sudo ceph pg dump --cluster slave | grep "^12."
dumped all
12.4          0                  0        0         0       0          0    0        0          active+undersized 2018-05-09 15:09:52.831363        0'0       502:19    [NONE,2,1,3,5]          2    [NONE,2,1,3,5]              2        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.0          0                  0        0         0       0          0    0        0          active+undersized 2018-05-09 15:09:52.835938        0'0       502:30    [0,3,1,2,NONE]          0    [0,3,1,2,NONE]              0        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.1          0                  0        0         0       0          0    0        0               active+clean 2018-05-09 15:07:39.630869        0'0       502:19       [3,1,5,2,0]          3       [3,1,5,2,0]              3        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.2          0                  0        0         0       0          0    0        0                 incomplete 2018-05-09 15:09:57.773180        0'0       502:27 [NONE,2,0,3,NONE]          2 [NONE,2,0,3,NONE]              2        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12.3          0                  0        0         0       0          0    0        0                 incomplete 2018-05-09 15:09:57.771912        0'0       502:27 [NONE,3,1,NONE,5]          3 [NONE,3,1,NONE,5]              3        0'0 2018-05-09 15:07:37.577427             0'0 2018-05-09 15:07:37.577427             0 
12     0 0     0 0 0           0      0      0 


Expected results:
None of pgs should get into incomplete state since k=3 and m=2 and maximum 2 osds can go down at OSD level failure domain.

Additional info:
By default this ecpool was created with min_size of '4'.
sudo ceph osd pool get ecpool min_size --cluster slave
min_size: 4
I am not sure whether this is applicable to erasure coded pools but by manually reducing min_size to '3', incomplete pgs were cleared.


I have attached crush map of my cluster.

Comment 10 Giridhar Ramaraju 2019-08-05 13:11:09 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 11 Giridhar Ramaraju 2019-08-05 13:12:10 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 13 Josh Durgin 2020-04-22 18:32:13 UTC
This is in all 5.0 builds - needs qa ack


Note You need to log in before you can comment on or make changes to this bug.