Bug 1545523

Summary: [GSS] AIX client failed to write to a temporarily file to gluster volume by gNFS.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: WenhanShi <wenshi>
Component: gluster-nfsAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, bkunal, jthottan, kkeithle, msaini, ndevos, olim, rcyriac, rhinduja, rhs-bugs, sankarshan, srmukher, ssaha, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-1 Doc Type: Bug Fix
Doc Text:
Previously, there was possibility that gfid passed a create call with NULL as a result. Therefore, when setattr post create used NULL as an input for gfid, it lead to ENVLAID error. With this fix, you can use the gfid from the linked node and create with SET_ATTR_MODE is successful.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:42:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503138    

Description WenhanShi 2018-02-15 07:50:29 UTC
Description of problem:

AIX client is mounting gluster volume via NFSv3. It use mv command to rename a temporarily file every 20 minutes. It got a error around [02/14 05:30] (UTC). and the error message on the AIX client is like this:

  mv: ./aa/TempFileA.TMP: A system call received a parameter that is not valid.

From nfs.log in gluster server, I saw some errors happened at that timestamp.

[2018-02-14 05:30:05.663122] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663170] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663192] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663207] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0
[2018-02-14 05:30:05.663216] W [MSGID: 108019] [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-aseries-vol-replicate-0: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000.
[2018-02-14 05:30:05.663225] I [MSGID: 108019] [afr-transaction.c:1829:afr_post_blocking_inodelk_cbk] 0-aseries-vol-replicate-0: Blocking inodelks failed.
[2018-02-14 05:30:05.667858] W [MSGID: 112199] [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /XXXXXXX/TempFileA.TMP => (XID: XXXXXX, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)), FH: exportid 1082083a-2329-4f38-9a7d-c2a2d6ad588e, gfid c1197be3-d217-4f9e-97fb-d7e305c5cb60, mountid 70c5a5b7-0000-0000-0000-000000000000


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-18.6.el7rhgs.x86_64                   Fri Sep 22 08:19:41 2017
glusterfs-api-3.8.4-18.6.el7rhgs.x86_64               Fri Sep 22 08:19:41 2017
glusterfs-cli-3.8.4-18.6.el7rhgs.x86_64               Fri Sep 22 08:19:41 2017
glusterfs-client-xlators-3.8.4-18.6.el7rhgs.x86_64    Fri Sep 22 08:19:41 2017
glusterfs-fuse-3.8.4-18.6.el7rhgs.x86_64              Fri Sep 22 08:19:41 2017
glusterfs-geo-replication-3.8.4-18.6.el7rhgs.x86_64   Fri Sep 22 08:19:42 2017
glusterfs-libs-3.8.4-18.6.el7rhgs.x86_64              Fri Sep 22 08:19:41 2017
glusterfs-server-3.8.4-18.6.el7rhgs.x86_64            Fri Sep 22 08:19:41 2017
samba-vfs-glusterfs-4.4.5-3.el7rhgs.x86_64            Wed Aug 23 14:18:11 2017


How reproducible:
- seems only one time

Steps to Reproduce:
1. N/A
2.
3.

Actual results:
Issue happened when mv a file.

Expected results:
No issue should happen when mv a file.

Additional info:
- This issue seems only happened once.
- Confirmed client is using NFSv3 and TCP.

Comment 4 Niels de Vos 2018-02-15 09:54:58 UTC
The error message that the nfs-client received is not correct behaviour:

  mv: ./aa/TempFileA.TMP: A system call received a parameter that is not valid.

This is the error that comes from a Gluster internal function, and should not have been relayed to the client like this.

  [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /XXXXXXX/TempFileA.TMP => (XID: XXXXXX, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument))


How it came to this is not immediately clear. For some reason it was impossible to obtain an inode-lock on the bricks. 

  [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-aseries-vol-replicate-0: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000

But, a zero'd out gfid is not valid, so requesting a lock on it is expected to fail.

Nothing immediately stands out in the logs to me, maybe Jiffin or Kaleb notice something more.

Comment 37 Manisha Saini 2018-03-19 09:23:48 UTC
Jiffin,

While testing hotfix build-

Ran bonnie from 1 client and lookup from 3 clients having GNFS volume mount,I am observing lots of warning related to split brain -

[2018-03-19 07:59:13.454730] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-0: remote operation failed [Operation not permitted]
[2018-03-19 07:59:13.454765] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-2: remote operation failed [No such file or directory]
[2018-03-19 07:59:13.454810] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-1: remote operation failed [No such file or directory]
[2018-03-19 07:59:13.454863] W [MSGID: 108008] [afr-read-txn.c:229:afr_read_txn] 0-gnfsvol-replicate-1: Unreadable subvolume -1 found with event generation 3 for gfid 270b4808-eef2-4c74-9348-5e4306d0eb51. (Possible split-brain)

*********************************
cat /var/log/glusterfs/nfs.log | grep "dentry not found"
[2018-03-19 07:58:44.269844] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/gKJOZJB3E00000019b8: dentry not found in 44cc0675-725c-4330-9c34-37eb3ee43f66
[2018-03-19 07:58:51.593171] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/xTFPl0VNsix70000002949: dentry not found in a52105b2-db2f-4c4d-b8e2-9e3eb5b90164
[2018-03-19 07:59:09.916474] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/uHDk6BmU8HU0000003c6a: dentry not found in 627b3c9f-b910-44d1-8861-b488b9ade671
[2018-03-19 07:59:12.071630] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/sf8SEFpZRr0000002235: dentry not found in f858752a-ab32-4c15-b252-c0eabb42454a
[2018-03-19 07:59:13.001616] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/PtGDlLNxhp0000003620: dentry not found in 011a4c3e-3e8c-479e-839c-efd076c12c51
**********************************

Are these messages expected?
There is no functionality Impact on application side.Bonnie completed successfully.

Comment 39 Bipin Kunal 2018-04-04 12:32:22 UTC
As per discussion over storage release mailing list, I have created RCM jira for hotfix signing.

Refer : https://projects.engineering.redhat.com/browse/RCM-31332

Comment 45 Manisha Saini 2018-08-13 14:38:45 UTC
Verified this with

# rpm -qa | grep gluster
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-server-3.12.2-15.el7rhgs.x86_64
glusterfs-3.12.2-15.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-ganesha-3.12.2-15.el7rhgs.x86_64
glusterfs-fuse-3.12.2-15.el7rhgs.x86_64
python2-gluster-3.12.2-15.el7rhgs.x86_64
glusterfs-api-3.12.2-15.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-15.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
nfs-ganesha-gluster-2.5.5-9.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-15.el7rhgs.x86_64
glusterfs-cli-3.12.2-15.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-14.el7rhgs.x86_64
glusterfs-libs-3.12.2-15.el7rhgs.x86_64
glusterfs-rdma-3.12.2-15.el7rhgs.x86_64


Verified with usecase reported in BZ 1413584.Ran bonnie from 1 client along with parallel lookup from another 3 client on dist-replicate volume.

Bonnie completed successfully.Moving this BZ to verified state,

Comment 48 errata-xmlrpc 2018-09-04 06:42:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607