Bug 1548517

Summary: write failed with EINVAL due O_DIRECT write buffer with unaligned size
Product: [Community] GlusterFS Reporter: Vitaly Lipatov <lav>
Component: posixAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, khiremat, olaf.buitelaar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:46:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test for aligned and unligned write with O_DIRECT none

Description Vitaly Lipatov 2018-02-23 17:56:54 UTC
Created attachment 1399948 [details]
test for aligned and unligned write with O_DIRECT

Description of problem:

I catched in a brick log billions errors about Invalid argument during write:

018-02-23 14:57:37.624075] E [MSGID: 113072] [posix.c:3631:posix_writev] 0-ftp-pub-posix: write failed: offset 131072, [Invalid argument]
[2018-02-23 14:57:37.624260] E [MSGID: 115067] [server-rpc-fops.c:1407:server_writev_cbk] 0-ftp-pub-server: 18548605: WRITEV 2 (cda02ff8-011e-4ecc-9e22-86741aa9fee5), client: multi.office.etersoft.ru-31148-2018/02/22-14:44:24:479443-ftp-pub-client-2-0-0, error-xlator: ftp-pub-posix [Invalid argument]

In strace -y -f -p on glusterfsd process it seems like
[pid 31198] pwrite64(28</var/local/eterglust/pub/.glusterfs/c1/a6/c1a6f57f-2082-466a-8f25-5430e281da58>, "libgl1-mesa-glx\nlibwine-vanilla\n", 32, 0) = -1 EINVAL (Invalid argument)

The line in xlators/storage/posix/src/posix.c where we got error has the comment:

/* not sure whether writev works on O_DIRECT'd fd */
                retval = sys_pwrite (fd, buf, vector[idx].iov_len, internal_off);

I wrote a little program (is attached) and discovered I have the error with newest kernels (4.4.*) and no problems with 2.6.32 kernel.

As I see we need for buffer address and for buffer size both use aligned (512) values only.

On both 32 and 64 bit system
glusterfs 3.12.5
kernel 2.6.32, 4.4.105

test result:
UNALIGNED address write: FAILED
ALIGNED address write: FAILED
UNALIGNED address with aligned size write: FAILED
ALIGNED address and size write: SUCCESSFUL

OpenVZ container result:
UNALIGNED address write: SUCCESSFUL
ALIGNED address write: SUCCESSFUL
UNALIGNED address with aligned size write: SUCCESSFUL
ALIGNED address and size write: SUCCESSFUL

Comment 1 Shyamsundar 2018-10-23 14:53:59 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 2 Amar Tumballi 2019-06-14 09:42:27 UTC
We have not noticed the problem in later kernels of Fedora29/30 etc. Needs to be tested again.

Comment 3 Olaf Buitelaar 2019-11-19 15:04:19 UTC
i'm seeing a similar issue on gluster 6.6 with centos 7 (kernel  3.10.0-1062.4.3.el7.x86_64);

[2019-11-19 14:56:04.017381] E [MSGID: 113072] [posix-inode-fd-ops.c:1886:posix_writev] 0-ovirt-data-posix: write failed: offset 0, [Invalid argument]
[2019-11-19 14:56:04.017462] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 221969: WRITEV 0 (309c077f-8882-43f7-a95b-ca2c4d27d2b5), client: CTX_ID:b3c80b69-0651-4e87-96d1-ee767cb7e425-GRAPH_ID:10-PID:19184-HOST:lease-16.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument]
[2019-11-19 14:56:12.430962] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 219748: WRITEV 0 (921dfa09-b252-4087-9c7c-47eda2a6266d), client: CTX_ID:05f7b92c-8dd6-434b-b835-7254dae1d1bc-GRAPH_ID:4-PID:93937-HOST:lease-23.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument]
[2019-11-19 14:56:27.345631] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 203815: WRITEV 4 (981676ff-6dbe-4a4c-8478-6e4f991a04f4), client: CTX_ID:366e668d-91ba-4373-960e-82e56f1ed7af-GRAPH_ID:0-PID:22624-HOST:lease-08.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument]
[2019-11-19 14:56:45.491788] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 210249: WRITEV 2 (a27a81c0-de78-40ee-9855-a62b6be01ffe), client: CTX_ID:4472864a-0fec-4e2c-ad3f-b9684b0808f6-GRAPH_ID:0-PID:30323-HOST:lease-21.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument]

Also i notice the cpu usage when this error occurs is very high.

The volume is configured with O_DIRECT;
Volume Name: ovirt-data
Type: Distributed-Replicate
Volume ID: 2775dc10-c197-446e-a73f-275853d38666
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: 10.201.0.5:/data5/gfs/bricks/brick1/ovirt-data
Brick2: 10.201.0.1:/data5/gfs/bricks/brick1/ovirt-data
Brick3: 10.201.0.9:/data0/gfs/bricks/bricka/ovirt-data (arbiter)
Brick4: 10.201.0.7:/data5/gfs/bricks/brick1/ovirt-data
Brick5: 10.201.0.9:/data5/gfs/bricks/brick1/ovirt-data
Brick6: 10.201.0.11:/data0/gfs/bricks/bricka/ovirt-data (arbiter)
Brick7: 10.201.0.6:/data5/gfs/bricks/brick1/ovirt-data
Brick8: 10.201.0.8:/data5/gfs/bricks/brick1/ovirt-data
Brick9: 10.201.0.12:/data0/gfs/bricks/bricka/ovirt-data (arbiter)
Brick10: 10.201.0.12:/data5/gfs/bricks/brick1/ovirt-data
Brick11: 10.201.0.11:/data5/gfs/bricks/brick1/ovirt-data
Brick12: 10.201.0.10:/data0/gfs/bricks/bricka/ovirt-data (arbiter)
Options Reconfigured:
performance.strict-o-direct: on
server.event-threads: 6
performance.cache-size: 384MB
performance.write-behind-window-size: 512MB
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
storage.owner-uid: 36
storage.owner-gid: 36
server.outstanding-rpc-limit: 1024
cluster.choose-local: off
cluster.brick-multiplex: on

Comment 4 Worker Ant 2020-03-12 12:46:46 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/946, and will be tracked there from now on. Visit GitHub issues URL for further details