Bug 1394871 - [RFE] A method to dynamically turn FOPEN_DIRECT_IO on/off for all files that operate on given inode.
Summary: [RFE] A method to dynamically turn FOPEN_DIRECT_IO on/off for all files that ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fuse
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Miklos Szeredi
QA Contact: Zorro Lang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-14 15:51 UTC by Brett Niver
Modified: 2021-01-15 07:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-15 07:28:35 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Brett Niver 2016-11-14 15:51:43 UTC
A method to dynamically turn FOPEN_DIRECT_IO on/off for all files that operate on given inode.  When there are multiple clients that read/write a file, ceph-fuse needs to disable page cache IO.  When all client are reading the file or single client read/write the file, ceps-fuse can enable page cache IO.

Comment 2 Miklos Szeredi 2018-09-14 14:44:33 UTC
So when in DIRECT_IO mode what do we do with shared mmaps?  We can't translate each and every memory access to a synchronous I/O request.

Currently FOPEN_DIRECT_IO will simply not allow MAP_SHARED mmap to begin with; which makes me wonder... was this mode ever used in real life filesystems?  If yes, then MAP_SHARED is so rare that this doesn't matter?

What is the expected behavior of MAP_SHARED memory maps with distributed filesystems that are expected to provide strict coherency?

Comment 3 Steve Whitehouse 2018-09-14 14:55:51 UTC
The expected behavoiur follows that of buffered I/O. We do expect and get full coherency with GFS2 for example, so you can do a shared mmap of a file on multiple nodes and use it like DSM. It will work, but it will be very slow.

There have been some issues wrt mmap and direct I/O in the past. The normal way it works from a DIO read/write perspective is:

1. unmap and flush the pages in the approriate file range
2. perform the direct I/O operation
3. If the DIO operation was a write, invalidate any pages in the page cache that were read in during #2 (since #1 will have emptied the page cache in the relevant rangei, any pages in that range must have appeared during step #2)

There have been discussions about actually blocking buffered operations on the relevant file range during the DIO operation so that step #3 is not required, but I don't think there has been a proposal that everybody has been happy with yet.

You can't combine mmap and direct i/o on the same fd, but as far as I know there is no reason why you can't combine them via separate fds.

I'm not sure if that answers the question though... ?

Comment 4 Miklos Szeredi 2018-09-14 15:03:18 UTC
GFS2 does this with marking the pte read-only or removing the mapping altoghether, right?  It does this with page granularity?

Comment 5 Steve Whitehouse 2018-09-14 15:07:24 UTC
It unmaps the page first, so any further mmap access will result in a page fault. Tha fault will then get stopped in the glock code, while the direct I/O operation is running. After the DIO completes, then the page fault will also complete. The glock is a per inode thing, so even though we zap individual pages, the granularity of the coherency is really per inode.

Comment 6 Miklos Szeredi 2018-09-17 07:04:08 UTC
Thanks, Steve, now I understand what GFS2 does and what's generally possible.

However this bug is about ceph-fuse, so what I'd /really/ be interested in what ceph-kernel does and what level of coherency is required of MAP_SHARED mappings for ceph-fuse.

Comment 10 RHEL Program Management 2021-01-15 07:28:35 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.