Bug 1450745
| Summary: | [GSS]Untar of Tarball taking too much time | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Abhishek Kumar <abhishku> |
| Component: | gluster-nfs | Assignee: | Niels de Vos <ndevos> |
| Status: | CLOSED DEFERRED | QA Contact: | Manisha Saini <msaini> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.2 | CC: | abhishku, amukherj, bkunal, nbalacha, ndevos, pgurusid, pkarampu, pparsons, rgowdapp, rhs-bugs, sankarshan, srangana, storage-qa-internal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-05 12:36:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1474007 | ||
|
Description
Abhishek Kumar
2017-05-15 06:23:02 UTC
Some more details about the *_SYNC options that NFS offers for WRITE procedures: [from https://tools.ietf.org/html/rfc1813#section-3.3.7] 3.3.7 Procedure 7: WRITE - Write to file SYNOPSIS WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; enum stable_how { UNSTABLE = 0, DATA_SYNC = 1, FILE_SYNC = 2 }; ... stable If stable is FILE_SYNC, the server must commit the data written plus all file system metadata to stable storage before returning results. This corresponds to the NFS version 2 protocol semantics. Any other behavior constitutes a protocol violation. If stable is DATA_SYNC, then the server must commit all of the data to stable storage and enough of the metadata to retrieve the data before returning. The server implementor is free to implement DATA_SYNC in the same fashion as FILE_SYNC, but with a possible performance drop. If stable is UNSTABLE, the server is free to commit any part of the data and the metadata to stable storage, including all or none, before returning a reply to the client. There is no guarantee whether or when any uncommitted data will subsequently be committed to stable storage. The only guarantees made by the server are that it will not destroy any data without changing the value of verf and that it will not commit the data and metadata at a level less than that requested by the client. See the discussion on COMMIT on page 92 for more information on if and when data is committed to stable storage. There are (volume) options for Gluster/NFS to fake syncing and synced writes: - nfs.trusted-sync - nfs.trusted-write From xlators/nfs/server/src/nfs.c: { .key = {"nfs3.*.trusted-write"}, .type = GF_OPTION_TYPE_BOOL, .default_value = "off", .description = "On an UNSTABLE write from client, return STABLE flag" " to force client to not send a COMMIT request. In " "some environments, combined with a replicated " "GlusterFS setup, this option can improve write " "performance. This flag allows user to trust Gluster" " replication logic to sync data to the disks and " "recover when required. COMMIT requests if received " "will be handled in a default manner by fsyncing." " STABLE writes are still handled in a sync manner. " "Off by default." }, { .key = {"nfs3.*.trusted-sync"}, .type = GF_OPTION_TYPE_BOOL, .default_value = "off", .description = "All writes and COMMIT requests are treated as async." " This implies that no write requests are guaranteed" " to be on server disks when the write reply is " "received at the NFS client. Trusted sync includes " " trusted-write behaviour. Off by default." }, The writes in the tcpdump show that FILE_SYNC is used.
(wireshark filter "rpc.msgtyp == CALL && nfs.procedure_v3 == WRITE")
This means that the following applies:
If stable is FILE_SYNC, the server must commit the data
written plus all file system metadata to stable storage
before returning results. [....] Any other behavior
constitutes a protocol violation.
So, enabling the nfs.trusted-sync option "constitutes a protocol violation".
Because this bug is about small files getting extracted from a tarball (new files, single write), the writes will be flushed on the close() syscall. Applications (here 'tar') can then check if the close() was successful or writing data failed somewhere. This is referred to as "close-to-open", indicating that files have completely been written once an other process/user reads the newly created file.
I suspect that disabling the close-to-open semantics will improve the performance for this use-case. However, it comes with the costs of a potential inconsistency after calling close(), data that has been expected to be written may still be buffered or in transit.
For tarball extraction, once could consider using the "nocto" mount option on the NFS-client side. Additional 'sync' calls or unmounting/remounting of the NFS-export will be needed to guarantee syncing of the contents to the NFS-server, otherwise the NFS-client may cache data locally.
Abhishek, I assume that this gives you sufficient insight on what can be done with Gluster/NFS and the Linux kernel NFS-client. Not sure if there really is a bug that we need fixing in Gluster here. If you agree, please close this as NOTABUG or similar. Thanks!
Please see 'man 5 nfs' for further details on using the "nocto" mount option:
cto / nocto Selects whether to use close-to-open cache coherence
semantics. If neither option is specified (or if cto is
specified), the client uses close-to-open cache coher‐
ence semantics. If the nocto option is specified, the
client uses a non-standard heuristic to determine when
files on the server have changed.
Using the nocto option may improve performance for read-
only mounts, but should be used only if the data on the
server changes only occasionally. The DATA AND METADATA
COHERENCE section discusses the behavior of this option
in more detail.
...
DATA AND METADATA COHERENCE
Some modern cluster file systems provide perfect cache coherence among
their clients. Perfect cache coherence among disparate NFS clients is
expensive to achieve, especially on wide area networks. As such, NFS
settles for weaker cache coherence that satisfies the requirements of
most file sharing types.
Close-to-open cache consistency
Typically file sharing is completely sequential. First client A opens
a file, writes something to it, then closes it. Then client B opens
the same file, and reads the changes.
When an application opens a file stored on an NFS version 3 server, the
NFS client checks that the file exists on the server and is permitted
to the opener by sending a GETATTR or ACCESS request. The NFS client
sends these requests regardless of the freshness of the file's cached
attributes.
When the application closes the file, the NFS client writes back any
pending changes to the file so that the next opener can view the
changes. This also gives the NFS client an opportunity to report write
errors to the application via the return code from close(2).
The behavior of checking at open time and flushing at close time is
referred to as close-to-open cache consistency, or CTO. It can be dis‐
abled for an entire mount point using the nocto mount option.
|