.The {storage-product} recovers with fewer OSDs available in an erasure coded (EC) pool
Previously, erasure coded (EC) pools of size `k+m` required at least `k+1` copies for recovery to function. If only `k` copies were available, recovery would be incomplete.
With this release, {storage-product} cluster now recovers with `k` or more copies available in an EC pool.
For more information on erasure coded pools, see the link:{storage-strategies-guide}#erasure_code_pools[_Erasure coded pools_] chapter in the _{storage-product} Storage Strategies Guide_.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:3294
Created attachment 1434216 [details] Crush map Description of problem: EC pool pgs are getting into incomplete state after killing "M" number of OSDs. Version-Release number of selected component (if applicable): ceph version 12.2.4-10.el7cp My Setup: 3 Mons, 3 OSD host with 8 osds in total Ec Pool: K=3 M=2, osd level failure domain Steps done: 1.Configured a ceph cluster 2.Created an EC pool(5Pgs)(k=3,M=2) configured with osd level failure domain. Profile: sudo ceph osd erasure-code-profile get myprofile --cluster slave crush-device-class= crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=3 m=2 plugin=jerasure technique=reed_sol_van w=8 Crush rule dump of this pool: sudo ceph osd crush rule dump ecpool --cluster slave { "rule_id": 1, "rule_name": "ecpool", "ruleset": 1, "type": 3, "min_size": 3, "max_size": 5, "steps": [ { "op": "set_chooseleaf_tries", "num": 5 }, { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -1, "item_name": "default" }, { "op": "choose_indep", "num": 0, "type": "osd" }, { "op": "emit" } ] } 3. Killed "M"(2) number of OSDs. Actual results: After killing 2 OSDs some of the Pgs of this ecpool got into incomplete state. sudo ceph pg dump --cluster slave | grep "^12." dumped all 12.4 0 0 0 0 0 0 0 0 active+undersized 2018-05-09 15:09:52.831363 0'0 502:19 [NONE,2,1,3,5] 2 [NONE,2,1,3,5] 2 0'0 2018-05-09 15:07:37.577427 0'0 2018-05-09 15:07:37.577427 0 12.0 0 0 0 0 0 0 0 0 active+undersized 2018-05-09 15:09:52.835938 0'0 502:30 [0,3,1,2,NONE] 0 [0,3,1,2,NONE] 0 0'0 2018-05-09 15:07:37.577427 0'0 2018-05-09 15:07:37.577427 0 12.1 0 0 0 0 0 0 0 0 active+clean 2018-05-09 15:07:39.630869 0'0 502:19 [3,1,5,2,0] 3 [3,1,5,2,0] 3 0'0 2018-05-09 15:07:37.577427 0'0 2018-05-09 15:07:37.577427 0 12.2 0 0 0 0 0 0 0 0 incomplete 2018-05-09 15:09:57.773180 0'0 502:27 [NONE,2,0,3,NONE] 2 [NONE,2,0,3,NONE] 2 0'0 2018-05-09 15:07:37.577427 0'0 2018-05-09 15:07:37.577427 0 12.3 0 0 0 0 0 0 0 0 incomplete 2018-05-09 15:09:57.771912 0'0 502:27 [NONE,3,1,NONE,5] 3 [NONE,3,1,NONE,5] 3 0'0 2018-05-09 15:07:37.577427 0'0 2018-05-09 15:07:37.577427 0 12 0 0 0 0 0 0 0 0 Expected results: None of pgs should get into incomplete state since k=3 and m=2 and maximum 2 osds can go down at OSD level failure domain. Additional info: By default this ecpool was created with min_size of '4'. sudo ceph osd pool get ecpool min_size --cluster slave min_size: 4 I am not sure whether this is applicable to erasure coded pools but by manually reducing min_size to '3', incomplete pgs were cleared. I have attached crush map of my cluster.