Bug 1793042

Summary: Client receives no error when triggering autocommit with WRITE FOP
Product: [Community] GlusterFS Reporter: david.spisla
Component: fuseAssignee: Mohammed Rafi KC <rkavunga>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5CC: amarts, bugs, rkavunga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:58:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs for Gluster FUSE and SMB Client access none

Description david.spisla 2020-01-20 14:47:36 UTC
Created attachment 1653961 [details]
Logs for Gluster FUSE and SMB Client access

Description of problem:
Client receives no error when triggering autocommit with WRITE FOP. This is reproducible via FUSE Client and also via SMB Client using glusterfs_vfs plugin.
In my opinion this is critical because if a client application triggers autocommit via WRITE it thinks, the WRITE was a success and deletes the file from its cache (client-side data loss). In the backend, of course, there is no data loss.

Version-Release number of selected component (if applicable):
5.10

How reproducible:
Steps to Reproduce (with FUSE client):
1. Create a gluster volume and enable worm-file-level
(180s is default for autocommit period) 
2. Mount volume via Native FUSE Client
3. In the mount path do:
$ echo test >> file1.txt && sleep 185 && echo test >> file1.txt

Actual results:
There is *no* error in bash like "Permission denied" or "Read-only filesystem" after triggering autocommit with the second WRITE FOP

Expected results:
There should be an error message.

Additional info:
If one triggers the autocommit with RENAME or TRUNCATE there is an error message.

In the attachment there are gluster-trace-logs (client and brick) for the above reproducible steps.

Additionally there are also smb-logs-with-trace to have an example what happens in a smb client (mounted via mount.cifs in the bash)

Comment 1 david.spisla 2020-01-20 15:01:10 UTC
Additional information from the attached smb client log:

The initial WRITE receives correctly no error:
[2020/01/20 09:43:18.375622, 10, pid=28300, effective(1101109, 1100513), real(1101109, 0)] ../source3/smbd/aio.c:935(aio_pwrite_smb2_done)
  pwrite_recv returned 5, err = no error

But the WRITE which triggers autocommit also receives no error, which is wrong because the WRITE FOP was blocked in the backend:
[2020/01/20 09:46:23.647131, 10, pid=28300, effective(1101109, 1100513), real(1101109, 0)] ../source3/smbd/aio.c:935(aio_pwrite_smb2_done)
  pwrite_recv returned 5, err = no error

Comment 2 Amar Tumballi 2020-01-20 15:18:31 UTC
Can you try the same experiment with 'gluster volume set <> write-behind off' and see if this is works fine?

Comment 3 david.spisla 2020-01-20 15:45:34 UTC
I forgot to give you our volume options:

Volume Name: repo2
Type: Replicate
Volume ID: 47b9d9e4-be80-4138-8a4c-d3fb77ba2db0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: fs-davids-c1-n1:/gluster/brick3/glusterbrick
Brick2: fs-davids-c1-n2:/gluster/brick3/glusterbrick
Brick3: fs-davids-c1-n3:/gluster/arbiter3/glusterbrick (arbiter)
Options Reconfigured:
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
user.smb: disable
features.read-only: off
features.worm: off
features.worm-file-level: on
features.retention-mode: enterprise
features.default-retention-period: 120
network.ping-timeout: 10
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.nl-cache: on
performance.nl-cache-timeout: 600
client.event-threads: 32
server.event-threads: 32
cluster.lookup-optimize: on
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-ima-xattrs: on
performance.io-thread-count: 64
cluster.use-compound-fops: on
performance.cache-size: 512MB
performance.cache-refresh-timeout: 10
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.write-behind: off
storage.build-pgfid: on
features.utime: on
storage.ctime: on
cluster.quorum-type: auto
features.bitrot: on
features.scrub: Active
features.scrub-freq: daily

Comment 4 david.spisla 2020-01-20 15:50:24 UTC
@Amar You are rigth, disabling the write-behind feature leads to error messages both for FUSE and SMB client.

Do you think there is a way to have error messages with write-behind enabled? In my opinion a client should receive an error message when a WRITE FOP fails for an WORMed file.

Comment 5 Amar Tumballi 2020-01-20 17:55:29 UTC
One way is, write-behind can be disabled for files which are 'WORM'ed, which it can figure out in 'open()/lookup()' call itself. Looks like a good one to have for Release-8. Can this be moved to 'github' as issue, so we can Track it to release-8 at least?

Comment 6 david.spisla 2020-01-21 11:09:13 UTC
Yes, I can move it to github as an issue and track it for release-8 at least. There we can discuss the details for the fix.

Comment 7 david.spisla 2020-01-23 10:19:00 UTC
Link to the github issue: https://github.com/gluster/glusterfs/issues/812

Comment 8 Worker Ant 2020-03-12 12:58:52 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/979, and will be tracked there from now on. Visit GitHub issues URL for further details