Description of problem: Slow/blocked requests for a specific pool "rbd" which has approx 66 million objects NAME ID USED %USED MAX AVAIL OBJECTS rbd 57 252T 7.90 345T 66220069 - Customer is using *filestore merge threadhold = 40* and *filestore split multiple = 8* from this bz : https://bugzilla.redhat.com/show_bug.cgi?id=1219974 - They have total 846 osds but "rbd" pool which is facing this issue is using ruleset 1 and it has 576 OSDs and all are 4TB. - In a single OSD node they have 32 OSDs in which 24 OSDs are of 4 TB which belongs to this *rbd* pool which is facing the slow request. - OSD nodes configuration - 125 GB RAM - 24 Core CPU - Networking Mode: in OSD nodes : bond : IEEE 802.3ad Dynamic link aggregation : 2*10Gbps NICs = 20Gbps - We have asked two OSDs which were facing slow request : osd.397.dump_ops_in_flight : { "ops": [ { "description": "osd_op(client.1862229.0:3035 benchmark_data_rcprsdc1r70-01-ac_2891081_object812 [delete] 57.37f66ef6 ack+ondisk+write+known_if_redirected e213940)", "initiated_at": "2016-05-03 16:45:27.170510", "age": 24121.264136, "duration": 0.000000, "type_data": [ "no flag points reached", { "client": "client.1862229", "tid": 3035 }, [ { "time": "2016-05-03 16:45:27.170510", "event": "initiated" } ] ] } ], "num_ops": 1 } - Both were in "no flag points reached" Version-Release number of selected component (if applicable): Upstream Hammer : ceph-0.94.3-0.el7.x86_64
After some discussion, I have a theory - they may have just hit the split threshold on many osds at once, resulting in high latency as they were all splitting directories at once (an expensive operation). Increasing the threshold may have stopped the splitting temporarily, but they will run into the same issue once they reach the larger threshold of 9600 files/dir. Continuing to increase the threshold increases the cost of background processes like backfill, scrub, and pg splitting, though we don't have good data on how high the threshold can be before causing issues there. Can we get tree output from say 100 random pgs in the rbd pool to verify that they were near the former split threshold of 5120 files/dir?
Added http://tracker.ceph.com/issues/15835 upstream as a possible way to mitigate this if this theory is correct.
(In reply to Josh Durgin from comment #55) > After some discussion, I have a theory - they may have just hit the split > threshold on many osds at once, resulting in high latency as they were all > splitting directories at once (an expensive operation). Increasing the > threshold may have stopped the splitting temporarily, but they will run into > the same issue once they reach the larger threshold of 9600 files/dir. > > Continuing to increase the threshold increases the cost of background > processes like backfill, scrub, and pg splitting, though we don't have good > data on how high the threshold can be before causing issues there. > > to verify that they were near the former split threshold of 5120 files/dir? Customer has captured this tree output almost a week after the implementation of the new filestore settings. But it does not matter as we are interested "to verify that they were near the former split threshold of 5120 files/dir?" and with current output it looks like all were near to threshold and which mostly gives an approval to our theory. am I right ? Thanks, Vikhyat
(In reply to Vikhyat Umrao from comment #59) > (In reply to Josh Durgin from comment #55) > > After some discussion, I have a theory - they may have just hit the split > > threshold on many osds at once, resulting in high latency as they were all > > splitting directories at once (an expensive operation). Increasing the > > threshold may have stopped the splitting temporarily, but they will run into > > the same issue once they reach the larger threshold of 9600 files/dir. > > > > Continuing to increase the threshold increases the cost of background > > processes like backfill, scrub, and pg splitting, though we don't have good > > data on how high the threshold can be before causing issues there. > > > > to verify that they were near the former split threshold of 5120 files/dir? > > Customer has captured this tree output almost a week after the > implementation of the new filestore settings. But it does not matter as we > are interested "to verify that they were near the former split threshold of > 5120 files/dir?" > > and with current output it looks like all were near to threshold and which > mostly gives an approval to our theory. am I right ? Yes, it looks like there's very little variation in number of files/pg, so they were very likely all just crossing the 5120 threshold when the slow requests started.
*** This bug has been marked as a duplicate of bug 1219974 ***