Bug 1793042 - Client receives no error when triggering autocommit with WRITE FOP
Summary: Client receives no error when triggering autocommit with WRITE FOP
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: 5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-20 14:47 UTC by david.spisla
Modified: 2020-03-12 12:58 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:58:52 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Logs for Gluster FUSE and SMB Client access (431.73 KB, application/gzip)
2020-01-20 14:47 UTC, david.spisla
no flags Details

Description david.spisla 2020-01-20 14:47:36 UTC
Created attachment 1653961 [details]
Logs for Gluster FUSE and SMB Client access

Description of problem:
Client receives no error when triggering autocommit with WRITE FOP. This is reproducible via FUSE Client and also via SMB Client using glusterfs_vfs plugin.
In my opinion this is critical because if a client application triggers autocommit via WRITE it thinks, the WRITE was a success and deletes the file from its cache (client-side data loss). In the backend, of course, there is no data loss.

Version-Release number of selected component (if applicable):
5.10

How reproducible:
Steps to Reproduce (with FUSE client):
1. Create a gluster volume and enable worm-file-level
(180s is default for autocommit period) 
2. Mount volume via Native FUSE Client
3. In the mount path do:
$ echo test >> file1.txt && sleep 185 && echo test >> file1.txt

Actual results:
There is *no* error in bash like "Permission denied" or "Read-only filesystem" after triggering autocommit with the second WRITE FOP

Expected results:
There should be an error message.

Additional info:
If one triggers the autocommit with RENAME or TRUNCATE there is an error message.

In the attachment there are gluster-trace-logs (client and brick) for the above reproducible steps.

Additionally there are also smb-logs-with-trace to have an example what happens in a smb client (mounted via mount.cifs in the bash)

Comment 1 david.spisla 2020-01-20 15:01:10 UTC
Additional information from the attached smb client log:

The initial WRITE receives correctly no error:
[2020/01/20 09:43:18.375622, 10, pid=28300, effective(1101109, 1100513), real(1101109, 0)] ../source3/smbd/aio.c:935(aio_pwrite_smb2_done)
  pwrite_recv returned 5, err = no error

But the WRITE which triggers autocommit also receives no error, which is wrong because the WRITE FOP was blocked in the backend:
[2020/01/20 09:46:23.647131, 10, pid=28300, effective(1101109, 1100513), real(1101109, 0)] ../source3/smbd/aio.c:935(aio_pwrite_smb2_done)
  pwrite_recv returned 5, err = no error

Comment 2 Amar Tumballi 2020-01-20 15:18:31 UTC
Can you try the same experiment with 'gluster volume set <> write-behind off' and see if this is works fine?

Comment 3 david.spisla 2020-01-20 15:45:34 UTC
I forgot to give you our volume options:

Volume Name: repo2
Type: Replicate
Volume ID: 47b9d9e4-be80-4138-8a4c-d3fb77ba2db0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: fs-davids-c1-n1:/gluster/brick3/glusterbrick
Brick2: fs-davids-c1-n2:/gluster/brick3/glusterbrick
Brick3: fs-davids-c1-n3:/gluster/arbiter3/glusterbrick (arbiter)
Options Reconfigured:
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
user.smb: disable
features.read-only: off
features.worm: off
features.worm-file-level: on
features.retention-mode: enterprise
features.default-retention-period: 120
network.ping-timeout: 10
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.nl-cache: on
performance.nl-cache-timeout: 600
client.event-threads: 32
server.event-threads: 32
cluster.lookup-optimize: on
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-ima-xattrs: on
performance.io-thread-count: 64
cluster.use-compound-fops: on
performance.cache-size: 512MB
performance.cache-refresh-timeout: 10
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.write-behind: off
storage.build-pgfid: on
features.utime: on
storage.ctime: on
cluster.quorum-type: auto
features.bitrot: on
features.scrub: Active
features.scrub-freq: daily

Comment 4 david.spisla 2020-01-20 15:50:24 UTC
@Amar You are rigth, disabling the write-behind feature leads to error messages both for FUSE and SMB client.

Do you think there is a way to have error messages with write-behind enabled? In my opinion a client should receive an error message when a WRITE FOP fails for an WORMed file.

Comment 5 Amar Tumballi 2020-01-20 17:55:29 UTC
One way is, write-behind can be disabled for files which are 'WORM'ed, which it can figure out in 'open()/lookup()' call itself. Looks like a good one to have for Release-8. Can this be moved to 'github' as issue, so we can Track it to release-8 at least?

Comment 6 david.spisla 2020-01-21 11:09:13 UTC
Yes, I can move it to github as an issue and track it for release-8 at least. There we can discuss the details for the fix.

Comment 7 david.spisla 2020-01-23 10:19:00 UTC
Link to the github issue: https://github.com/gluster/glusterfs/issues/812

Comment 8 Worker Ant 2020-03-12 12:58:52 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/979, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.