Bug 1332874 - Slow/blocked requests for a specific pool "rbd" which has approx 66 million objects
Summary: Slow/blocked requests for a specific pool "rbd" which has approx 66 million o...
Keywords:
Status: CLOSED DUPLICATE of bug 1219974
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: All
urgent
urgent
Target Milestone: rc
: 1.3.4
Assignee: Josh Durgin
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-04 09:12 UTC by Vikhyat Umrao
Modified: 2020-06-11 12:51 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-26 12:49:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1337018 0 urgent CLOSED [RFE] filestore: randomize split threshold 2021-03-11 14:34:24 UTC

Internal Links: 1337018

Description Vikhyat Umrao 2016-05-04 09:12:00 UTC
Description of problem:
Slow/blocked requests for a specific pool "rbd" which has approx 66 million objects 

NAME                                ID     USED       %USED     MAX AVAIL     OBJECTS
rbd                                    57       252T           7.90          345T              66220069 

- Customer is using *filestore merge threadhold = 40* and *filestore split multiple = 8*  from this bz : https://bugzilla.redhat.com/show_bug.cgi?id=1219974

- They have total 846 osds but "rbd" pool which is facing this issue is using ruleset 1 and it has 576 OSDs and all are 4TB.

- In a single OSD node they have 32 OSDs in which 24 OSDs are of 4 TB which belongs to this *rbd* pool which is facing the slow request.

- OSD nodes configuration 
- 125 GB RAM
- 24 Core CPU
- Networking Mode:  in OSD nodes :
 bond : IEEE 802.3ad Dynamic link aggregation : 2*10Gbps NICs = 20Gbps

- We have asked two OSDs which were facing slow request : 

osd.397.dump_ops_in_flight :

{
    "ops": [
        {
            "description": "osd_op(client.1862229.0:3035 benchmark_data_rcprsdc1r70-01-ac_2891081_object812 [delete] 57.37f66ef6 ack+ondisk+write+known_if_redirected e213940)",
            "initiated_at": "2016-05-03 16:45:27.170510",
            "age": 24121.264136,
            "duration": 0.000000,
            "type_data": [
                "no flag points reached",
                {
                    "client": "client.1862229",
                    "tid": 3035
                },
                [
                    {
                        "time": "2016-05-03 16:45:27.170510",
                        "event": "initiated"
                    }
                ]
            ]
        }
    ],
    "num_ops": 1
}

- Both were in  "no flag points reached" 


Version-Release number of selected component (if applicable):
Upstream Hammer : ceph-0.94.3-0.el7.x86_64

Comment 55 Josh Durgin 2016-05-11 01:02:47 UTC
After some discussion, I have a theory - they may have just hit the split threshold on many osds at once, resulting in high latency as they were all splitting directories at once (an expensive operation). Increasing the threshold may have stopped the splitting temporarily, but they will run into the same issue once they reach the larger threshold of 9600 files/dir.

Continuing to increase the threshold increases the cost of background processes like backfill, scrub, and pg splitting, though we don't have good data on how high the threshold can be before causing issues there.

Can we get tree output from say 100 random pgs in the rbd pool to verify that they were near the former split threshold of 5120 files/dir?

Comment 56 Josh Durgin 2016-05-11 01:07:58 UTC
Added http://tracker.ceph.com/issues/15835 upstream as a possible way to mitigate this if this theory is correct.

Comment 59 Vikhyat Umrao 2016-05-17 06:21:15 UTC
(In reply to Josh Durgin from comment #55)
> After some discussion, I have a theory - they may have just hit the split
> threshold on many osds at once, resulting in high latency as they were all
> splitting directories at once (an expensive operation). Increasing the
> threshold may have stopped the splitting temporarily, but they will run into
> the same issue once they reach the larger threshold of 9600 files/dir.
> 
> Continuing to increase the threshold increases the cost of background
> processes like backfill, scrub, and pg splitting, though we don't have good
> data on how high the threshold can be before causing issues there.
> 
> to verify that they were near the former split threshold of 5120 files/dir?

Customer has captured this tree output almost a week after the implementation of the new filestore settings. But it does not matter as we are interested "to verify that they were near the former split threshold of 5120 files/dir?" 

and with current output it looks like all were near to threshold and which mostly gives an approval to our theory. am I right ?

Thanks,
Vikhyat

Comment 60 Josh Durgin 2016-05-17 07:16:15 UTC
(In reply to Vikhyat Umrao from comment #59)
> (In reply to Josh Durgin from comment #55)
> > After some discussion, I have a theory - they may have just hit the split
> > threshold on many osds at once, resulting in high latency as they were all
> > splitting directories at once (an expensive operation). Increasing the
> > threshold may have stopped the splitting temporarily, but they will run into
> > the same issue once they reach the larger threshold of 9600 files/dir.
> > 
> > Continuing to increase the threshold increases the cost of background
> > processes like backfill, scrub, and pg splitting, though we don't have good
> > data on how high the threshold can be before causing issues there.
> > 
> > to verify that they were near the former split threshold of 5120 files/dir?
> 
> Customer has captured this tree output almost a week after the
> implementation of the new filestore settings. But it does not matter as we
> are interested "to verify that they were near the former split threshold of
> 5120 files/dir?" 
> 
> and with current output it looks like all were near to threshold and which
> mostly gives an approval to our theory. am I right ?

Yes, it looks like there's very little variation in number of files/pg, so they were very likely all just crossing the 5120 threshold when the slow requests started.

Comment 68 Vikhyat Umrao 2016-09-26 12:49:47 UTC

*** This bug has been marked as a duplicate of bug 1219974 ***


Note You need to log in before you can comment on or make changes to this bug.