Description of problem: AIX client is mounting gluster volume via NFSv3. It use mv command to rename a temporarily file every 20 minutes. It got a error around [02/14 05:30] (UTC). and the error message on the AIX client is like this: mv: ./aa/TempFileA.TMP: A system call received a parameter that is not valid. From nfs.log in gluster server, I saw some errors happened at that timestamp. [2018-02-14 05:30:05.663122] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0 [2018-02-14 05:30:05.663170] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0 [2018-02-14 05:30:05.663192] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0 [2018-02-14 05:30:05.663207] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc7a0) [0x7f3bb58f27a0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28ad0) [0x7f3bb590ead0] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3ab16) [0x7f3bb5920b16] ) 0-: Assertion failed: 0 [2018-02-14 05:30:05.663216] W [MSGID: 108019] [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-aseries-vol-replicate-0: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000. [2018-02-14 05:30:05.663225] I [MSGID: 108019] [afr-transaction.c:1829:afr_post_blocking_inodelk_cbk] 0-aseries-vol-replicate-0: Blocking inodelks failed. [2018-02-14 05:30:05.667858] W [MSGID: 112199] [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /XXXXXXX/TempFileA.TMP => (XID: XXXXXX, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)), FH: exportid 1082083a-2329-4f38-9a7d-c2a2d6ad588e, gfid c1197be3-d217-4f9e-97fb-d7e305c5cb60, mountid 70c5a5b7-0000-0000-0000-000000000000 Version-Release number of selected component (if applicable): glusterfs-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 glusterfs-api-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 glusterfs-cli-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 glusterfs-client-xlators-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 glusterfs-fuse-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 glusterfs-geo-replication-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:42 2017 glusterfs-libs-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 glusterfs-server-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41 2017 samba-vfs-glusterfs-4.4.5-3.el7rhgs.x86_64 Wed Aug 23 14:18:11 2017 How reproducible: - seems only one time Steps to Reproduce: 1. N/A 2. 3. Actual results: Issue happened when mv a file. Expected results: No issue should happen when mv a file. Additional info: - This issue seems only happened once. - Confirmed client is using NFSv3 and TCP.
The error message that the nfs-client received is not correct behaviour: mv: ./aa/TempFileA.TMP: A system call received a parameter that is not valid. This is the error that comes from a Gluster internal function, and should not have been relayed to the client like this. [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /XXXXXXX/TempFileA.TMP => (XID: XXXXXX, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)) How it came to this is not immediately clear. For some reason it was impossible to obtain an inode-lock on the bricks. [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-aseries-vol-replicate-0: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000 But, a zero'd out gfid is not valid, so requesting a lock on it is expected to fail. Nothing immediately stands out in the logs to me, maybe Jiffin or Kaleb notice something more.
Jiffin, While testing hotfix build- Ran bonnie from 1 client and lookup from 3 clients having GNFS volume mount,I am observing lots of warning related to split brain - [2018-03-19 07:59:13.454730] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-0: remote operation failed [Operation not permitted] [2018-03-19 07:59:13.454765] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-2: remote operation failed [No such file or directory] [2018-03-19 07:59:13.454810] W [MSGID: 114031] [client-rpc-fops.c:1495:client3_3_fstat_cbk] 0-gnfsvol-client-1: remote operation failed [No such file or directory] [2018-03-19 07:59:13.454863] W [MSGID: 108008] [afr-read-txn.c:229:afr_read_txn] 0-gnfsvol-replicate-1: Unreadable subvolume -1 found with event generation 3 for gfid 270b4808-eef2-4c74-9348-5e4306d0eb51. (Possible split-brain) ********************************* cat /var/log/glusterfs/nfs.log | grep "dentry not found" [2018-03-19 07:58:44.269844] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/gKJOZJB3E00000019b8: dentry not found in 44cc0675-725c-4330-9c34-37eb3ee43f66 [2018-03-19 07:58:51.593171] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/xTFPl0VNsix70000002949: dentry not found in a52105b2-db2f-4c4d-b8e2-9e3eb5b90164 [2018-03-19 07:59:09.916474] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/uHDk6BmU8HU0000003c6a: dentry not found in 627b3c9f-b910-44d1-8861-b488b9ade671 [2018-03-19 07:59:12.071630] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/sf8SEFpZRr0000002235: dentry not found in f858752a-ab32-4c15-b252-c0eabb42454a [2018-03-19 07:59:13.001616] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 270b4808-eef2-4c74-9348-5e4306d0eb51/PtGDlLNxhp0000003620: dentry not found in 011a4c3e-3e8c-479e-839c-efd076c12c51 ********************************** Are these messages expected? There is no functionality Impact on application side.Bonnie completed successfully.
As per discussion over storage release mailing list, I have created RCM jira for hotfix signing. Refer : https://projects.engineering.redhat.com/browse/RCM-31332
Verified this with # rpm -qa | grep gluster gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-server-3.12.2-15.el7rhgs.x86_64 glusterfs-3.12.2-15.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-ganesha-3.12.2-15.el7rhgs.x86_64 glusterfs-fuse-3.12.2-15.el7rhgs.x86_64 python2-gluster-3.12.2-15.el7rhgs.x86_64 glusterfs-api-3.12.2-15.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-15.el7rhgs.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch nfs-ganesha-gluster-2.5.5-9.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-15.el7rhgs.x86_64 glusterfs-cli-3.12.2-15.el7rhgs.x86_64 glusterfs-debuginfo-3.12.2-14.el7rhgs.x86_64 glusterfs-libs-3.12.2-15.el7rhgs.x86_64 glusterfs-rdma-3.12.2-15.el7rhgs.x86_64 Verified with usecase reported in BZ 1413584.Ran bonnie from 1 client along with parallel lookup from another 3 client on dist-replicate volume. Bonnie completed successfully.Moving this BZ to verified state,
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607