Bug 1793042

Summary:

Client receives no error when triggering autocommit with WRITE FOP

Product:

[Community] GlusterFS

Reporter:

david.spisla

Component:

fuse

Assignee:

Mohammed Rafi KC <rkavunga>

Status:

CLOSED UPSTREAM

QA Contact:

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

amarts, bugs, rkavunga

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-03-12 12:58:52 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Logs for Gluster FUSE and SMB Client access	none

Description david.spisla 2020-01-20 14:47:36 UTC

Created attachment 1653961 [details]
Logs for Gluster FUSE and SMB Client access

Description of problem:
Client receives no error when triggering autocommit with WRITE FOP. This is reproducible via FUSE Client and also via SMB Client using glusterfs_vfs plugin.
In my opinion this is critical because if a client application triggers autocommit via WRITE it thinks, the WRITE was a success and deletes the file from its cache (client-side data loss). In the backend, of course, there is no data loss.

Version-Release number of selected component (if applicable):
5.10

How reproducible:
Steps to Reproduce (with FUSE client):
1. Create a gluster volume and enable worm-file-level
(180s is default for autocommit period) 
2. Mount volume via Native FUSE Client
3. In the mount path do:
$ echo test >> file1.txt && sleep 185 && echo test >> file1.txt

Actual results:
There is *no* error in bash like "Permission denied" or "Read-only filesystem" after triggering autocommit with the second WRITE FOP

Expected results:
There should be an error message.

Additional info:
If one triggers the autocommit with RENAME or TRUNCATE there is an error message.

In the attachment there are gluster-trace-logs (client and brick) for the above reproducible steps.

Additionally there are also smb-logs-with-trace to have an example what happens in a smb client (mounted via mount.cifs in the bash)

Comment 1 david.spisla 2020-01-20 15:01:10 UTC

Additional information from the attached smb client log:

The initial WRITE receives correctly no error:
[2020/01/20 09:43:18.375622, 10, pid=28300, effective(1101109, 1100513), real(1101109, 0)] ../source3/smbd/aio.c:935(aio_pwrite_smb2_done)
  pwrite_recv returned 5, err = no error

But the WRITE which triggers autocommit also receives no error, which is wrong because the WRITE FOP was blocked in the backend:
[2020/01/20 09:46:23.647131, 10, pid=28300, effective(1101109, 1100513), real(1101109, 0)] ../source3/smbd/aio.c:935(aio_pwrite_smb2_done)
  pwrite_recv returned 5, err = no error

Comment 2 Amar Tumballi 2020-01-20 15:18:31 UTC

Can you try the same experiment with 'gluster volume set <> write-behind off' and see if this is works fine?

Comment 3 david.spisla 2020-01-20 15:45:34 UTC

I forgot to give you our volume options:

Volume Name: repo2
Type: Replicate
Volume ID: 47b9d9e4-be80-4138-8a4c-d3fb77ba2db0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: fs-davids-c1-n1:/gluster/brick3/glusterbrick
Brick2: fs-davids-c1-n2:/gluster/brick3/glusterbrick
Brick3: fs-davids-c1-n3:/gluster/arbiter3/glusterbrick (arbiter)
Options Reconfigured:
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
user.smb: disable
features.read-only: off
features.worm: off
features.worm-file-level: on
features.retention-mode: enterprise
features.default-retention-period: 120
network.ping-timeout: 10
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.nl-cache: on
performance.nl-cache-timeout: 600
client.event-threads: 32
server.event-threads: 32
cluster.lookup-optimize: on
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-ima-xattrs: on
performance.io-thread-count: 64
cluster.use-compound-fops: on
performance.cache-size: 512MB
performance.cache-refresh-timeout: 10
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.write-behind: off
storage.build-pgfid: on
features.utime: on
storage.ctime: on
cluster.quorum-type: auto
features.bitrot: on
features.scrub: Active
features.scrub-freq: daily

Comment 4 david.spisla 2020-01-20 15:50:24 UTC

@Amar You are rigth, disabling the write-behind feature leads to error messages both for FUSE and SMB client.

Do you think there is a way to have error messages with write-behind enabled? In my opinion a client should receive an error message when a WRITE FOP fails for an WORMed file.

Comment 5 Amar Tumballi 2020-01-20 17:55:29 UTC

One way is, write-behind can be disabled for files which are 'WORM'ed, which it can figure out in 'open()/lookup()' call itself. Looks like a good one to have for Release-8. Can this be moved to 'github' as issue, so we can Track it to release-8 at least?

Comment 6 david.spisla 2020-01-21 11:09:13 UTC

Yes, I can move it to github as an issue and track it for release-8 at least. There we can discuss the details for the fix.

Comment 7 david.spisla 2020-01-23 10:19:00 UTC

Link to the github issue: https://github.com/gluster/glusterfs/issues/812

Comment 8 Worker Ant 2020-03-12 12:58:52 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/979, and will be tracked there from now on. Visit GitHub issues URL for further details