Bug 1545523 - [GSS] AIX client failed to write to a temporarily file to gluster volume by gNFS.
Summary: [GSS] AIX client failed to write to a temporarily file to gluster volume by g...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nfs
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
Target Milestone: ---
: RHGS 3.4.0
Assignee: Jiffin
QA Contact: Manisha Saini
Depends On:
Blocks: 1503138
TreeView+ depends on / blocked
Reported: 2018-02-15 07:50 UTC by WenhanShi
Modified: 2018-09-06 08:30 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.12.2-1
Doc Type: Bug Fix
Doc Text:
Previously, there was possibility that gfid passed a create call with NULL as a result. Therefore, when setattr post create used NULL as an input for gfid, it lead to ENVLAID error. With this fix, you can use the gfid from the linked node and create with SET_ATTR_MODE is successful.
Clone Of:
Last Closed: 2018-09-04 06:42:41 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:43:59 UTC

Description WenhanShi 2018-02-15 07:50:29 UTC
Description of problem:

AIX client is mounting gluster volume via NFSv3. It use mv command to rename a temporarily file every 20 minutes. It got a error around [02/14 05:30] (UTC). and the error message on the AIX client is like this:

  mv: ./aa/TempFileA.TMP: A system call received a parameter that is not valid.

From nfs.log in gluster server, I saw some errors happened at that timestamp.

[2018-02-14 05:30:05.663122] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663170] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663192] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663207] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663216] W [MSGID: 108019] [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-aseries-vol-replicate-0: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000.
[2018-02-14 05:30:05.663225] I [MSGID: 108019] [afr-transaction.c:1829:afr_post_blocking_inodelk_cbk] 0-aseries-vol-replicate-0: Blocking inodelks failed.
[2018-02-14 05:30:05.667858] W [MSGID: 112199] [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /XXXXXXX/TempFileA.TMP => (XID: XXXXXX, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)), FH: exportid 1082083a-2329-4f38-9a7d-c2a2d6ad588e, gfid c1197be3-d217-4f9e-97fb-d7e305c5cb60, mountid 70c5a5b7-0000-0000-0000-000000000000

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-18.6.el7rhgs.x86_64                   Fri Sep 22 08:19:41 2017
glusterfs-api-3.8.4-18.6.el7rhgs.x86_64               Fri Sep 22 08:19:41 2017
glusterfs-cli-3.8.4-18.6.el7rhgs.x86_64               Fri Sep 22 08:19:41 2017
glusterfs-client-xlators-3.8.4-18.6.el7rhgs.x86_64    Fri Sep 22 08:19:41 2017
glusterfs-fuse-3.8.4-18.6.el7rhgs.x86_64              Fri Sep 22 08:19:41 2017
glusterfs-geo-replication-3.8.4-18.6.el7rhgs.x86_64   Fri Sep 22 08:19:42 2017
glusterfs-libs-3.8.4-18.6.el7rhgs.x86_64              Fri Sep 22 08:19:41 2017
glusterfs-server-3.8.4-18.6.el7rhgs.x86_64            Fri Sep 22 08:19:41 2017
samba-vfs-glusterfs-4.4.5-3.el7rhgs.x86_64            Wed Aug 23 14:18:11 2017

How reproducible:
- seems only one time

Steps to Reproduce:
1. N/A

Actual results:
Issue happened when mv a file.

Expected results:
No issue should happen when mv a file.

Additional info:
- This issue seems only happened once.
- Confirmed client is using NFSv3 and TCP.

Comment 4 Niels de Vos 2018-02-15 09:54:58 UTC
The error message that the nfs-client received is not correct behaviour:

  mv: ./aa/TempFileA.TMP: A system call received a parameter that is not valid.

This is the error that comes from a Gluster internal function, and should not have been relayed to the client like this.

  [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /XXXXXXX/TempFileA.TMP => (XID: XXXXXX, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument))

How it came to this is not immediately clear. For some reason it was impossible to obtain an inode-lock on the bricks. 

  [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-aseries-vol-replicate-0: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000

But, a zero'd out gfid is not valid, so requesting a lock on it is expected to fail.

Nothing immediately stands out in the logs to me, maybe Jiffin or Kaleb notice something more.

Comment 37 Manisha Saini 2018-03-19 09:23:48 UTC

While testing hotfix build-

Ran bonnie from 1 client and lookup from 3 clients having GNFS volume mount,I am observing lots of warning related to split brain -

[2018-03-19 07:59:13.454730] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-0: remote operation failed [Operation not permitted]
[2018-03-19 07:59:13.454765] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-2: remote operation failed [No such file or directory]
[2018-03-19 07:59:13.454810] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-1: remote operation failed [No such file or directory]
[2018-03-19 07:59:13.454863] W [MSGID: 108008] [afr-read-txn.c:229:afr_read_txn] 0-gnfsvol-replicate-1: Unreadable subvolume -1 found with event generation 3 for gfid 270b4808-eef2-4c74-9348-5e4306d0eb51. (Possible split-brain)

cat /var/log/glusterfs/nfs.log | grep "dentry not found"
[2018-03-19 07:58:44.269844] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/gKJOZJB3E00000019b8: dentry not found in 44cc0675-725c-4330-9c34-37eb3ee43f66
[2018-03-19 07:58:51.593171] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/xTFPl0VNsix70000002949: dentry not found in a52105b2-db2f-4c4d-b8e2-9e3eb5b90164
[2018-03-19 07:59:09.916474] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/uHDk6BmU8HU0000003c6a: dentry not found in 627b3c9f-b910-44d1-8861-b488b9ade671
[2018-03-19 07:59:12.071630] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/sf8SEFpZRr0000002235: dentry not found in f858752a-ab32-4c15-b252-c0eabb42454a
[2018-03-19 07:59:13.001616] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/PtGDlLNxhp0000003620: dentry not found in 011a4c3e-3e8c-479e-839c-efd076c12c51

Are these messages expected?
There is no functionality Impact on application side.Bonnie completed successfully.

Comment 39 Bipin Kunal 2018-04-04 12:32:22 UTC
As per discussion over storage release mailing list, I have created RCM jira for hotfix signing.

Refer : https://projects.engineering.redhat.com/browse/RCM-31332

Comment 45 Manisha Saini 2018-08-13 14:38:45 UTC
Verified this with

# rpm -qa | grep gluster

Verified with usecase reported in BZ 1413584.Ran bonnie from 1 client along with parallel lookup from another 3 client on dist-replicate volume.

Bonnie completed successfully.Moving this BZ to verified state,

Comment 48 errata-xmlrpc 2018-09-04 06:42:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.