Bug 1795309 - fs/ceph: stop asking for capabilities that go unused
Summary: fs/ceph: stop asking for capabilities that go unused
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: 8.0
Assignee: Yan, Zheng
QA Contact: Tejas
John Wilkins
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-27 16:34 UTC by Patrick Donnelly
Modified: 2020-07-01 12:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 43748 None None None 2020-01-27 16:34:15 UTC
Red Hat Bugzilla 1795306 None None None 2020-01-27 16:35:22 UTC

Internal Links: 1795306

Description Patrick Donnelly 2020-01-27 16:34:15 UTC
Description of problem:

See: https://tracker.ceph.com/issues/43748


Kernel should stop asking for capabilities (wanted set) that go unused. This causes unnecessary capability traffic with competing clients and makes client application failover (governed by file locks) take longer.

See Ceph tracker issue for reproduction steps. Except: instead of using SIGKILL, use a hard reset of the machine or blocking all traffic with iptables.

Comment 1 Jeff Layton 2020-01-28 17:10:24 UTC
I took a brief look and it at first looked like we could mainly just remove this from ceph_add_cap:

        /*
         * If we are opening the file, include file mode wanted bits
         * in wanted.
         */
        if (fmode >= 0)
                wanted |= ceph_caps_for_mode(fmode);

...but I don't think that's sufficient. A lot of the cap handling code seems to call __ceph_caps_file_wanted to determine base "want" values, and that sets cap bits that correspond to open fd's for the inode.

There's also stuff like this in there:

        file_wanted = __ceph_caps_file_wanted(ci);
        used = __ceph_caps_used(ci);
        issued = __ceph_caps_issued(ci, &implemented);
        revoking = implemented & ~issued;

        want = file_wanted;
        retain = file_wanted | used | CEPH_CAP_PIN;
        if (!mdsc->stopping && inode->i_nlink > 0) {
                if (file_wanted) {
                        retain |= CEPH_CAP_ANY;       /* be greedy */

So, if we have the file open for any reason, we'll "be greedy". As an experiment we could try playing with a patch that just has __ceph_caps_file_wanted return CEPH_CAP_PIN and see whether that causes issues in testing?

Comment 2 Douglas Fuller 2020-01-28 18:36:01 UTC
I think this bug can be made public. @Patrick, any objections?

Comment 3 Jeff Layton 2020-01-28 20:42:59 UTC
Also, I think this points out the need for better instrumentation around the cap handling code, so we can watch the way caps flow between the client and MDS. I had some tracepoints that I had added a while back, but they got shot down on review. We may need to revisit them though.

Comment 4 Jeff Layton 2020-01-29 13:42:49 UTC
I started working on a draft patch to just make the kernel not request these caps on open, but it looks like the MDS assumes a "wanted" mask based on the mode in an OPEN request. Even if we don't request them, it assumes we want them anyway during the open, and we'd have to issue a cap request to override that. That leads me to think that Zheng's idea is probably best, and that we ought to come up with a (well-defined and documented!) mechanism for clearing unused wanted mask bits.

What I'm sort of envisioning is keeping something like a bloom filter and a timestamp per inode. When something wants to get cap references, we'd set a bit in the filter mask. After a certain amount of time has passed (a few seconds?), we'd consider any bits not in that mask subject to being cleared, reset the timestamp and clear out the filter and do it over again.

Comment 5 Patrick Donnelly 2020-01-30 11:27:51 UTC
(In reply to Jeff Layton from comment #4)
> I started working on a draft patch to just make the kernel not request these
> caps on open, but it looks like the MDS assumes a "wanted" mask based on the
> mode in an OPEN request. Even if we don't request them, it assumes we want
> them anyway during the open, and we'd have to issue a cap request to
> override that. That leads me to think that Zheng's idea is probably best,
> and that we ought to come up with a (well-defined and documented!) mechanism
> for clearing unused wanted mask bits.

Jeff and I talked about this yesterday a bit. I think we can update the wanted field whenever we get a cap revoke from the MDS. Then the kernel can evaluate whether it has been using the caps it was given. It's not really a problem that the MDS gives out caps gratuitously until we start seeing clients fight over caps.


Note You need to log in before you can comment on or make changes to this bug.