Bug 1413971

Summary: Bonnie test suite failed with "Can't open file" error
Product: [Community] GlusterFS Reporter: Jiffin <jthottan>
Component: nfsAssignee: Jiffin <jthottan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, jthottan, ndevos, sbhaloth, tdesala
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.11.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1413584 Environment:
Last Closed: 2017-05-30 18:38:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1413584    

Description Jiffin 2017-01-17 13:16:50 UTC
+++ This bug was initially created as a clone of Bug #1413584 +++

Description of problem:
=======================
On a distributed-replicate volume, Bonnie test suite failed with "Can't open file" error.

===========================TESTS RUNNING===========================
Changing to the specified mountpoint
/mnt/nfs/run15370
executing bonnie
Using uid:0, gid:0.
Writing with putc()...
done
Can't open file ./Bonnie.15391.005      <----------------------------

real	5m22.818s
user	4m22.679s
sys	0m47.933s
bonnie failed
0
Total 0 tests were successful
Switching over to the previous working directory
Removing /mnt/nfs//run15370/
rmdir: failed to remove ‘/mnt/nfs//run15370/’: Directory not empty
rmdir failed:Directory not empty

Version-Release number of selected component (if applicable):
3.8.4-11.el7rhgs.x86_64

How reproducible:
================
1/1

Steps to Reproduce:
===================
1) Create a distributed-replicate volume and start it (Please see the vol info output for enabled volume settings).
2) Mount it on multiples clients via gNFS.
3) Start bonnie test suite from one client and from the remaining clients start a infinite loop of lookups.

Bonnie test suite failed with "Can't open file" error and I can see the below errors in nfs logs.

[2017-01-16 10:52:08.042767] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 09388be3-eb94-4cb1-94b0-0518e00d3c90/Bonnie.15391.001: dentry not found in a32137f8-6941-4db4-b723-10019fd4f359
[2017-01-16 10:52:08.386420] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 09388be3-eb94-4cb1-94b0-0518e00d3c90/Bonnie.15391.003: dentry not found in 3c05ba77-5bad-4da8-8d5c-135fa9366d5f
[2017-01-16 10:52:08.956338] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 09388be3-eb94-4cb1-94b0-0518e00d3c90/Bonnie.15391.004: dentry not found in 670abac1-f5d3-42b2-b0b2-4900c1915a9a
[2017-01-16 10:52:09.067140] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 09388be3-eb94-4cb1-94b0-0518e00d3c90/Bonnie.15391.005: dentry not found in 3f678a33-317b-4cde-a959-0e038cdc04ca
[2017-01-16 10:52:09.125807] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc740) [0x7f866b734740] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28a70) [0x7f866b750a70] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3aa66) [0x7f866b762a66] ) 0-: Assertion failed: 0
[2017-01-16 10:52:09.126007] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc740) [0x7f866b734740] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28a70) [0x7f866b750a70] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3aa66) [0x7f866b762a66] ) 0-: Assertion failed: 0
[2017-01-16 10:52:09.126267] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc740) [0x7f866b734740] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28a70) [0x7f866b750a70] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3aa66) [0x7f866b762a66] ) 0-: Assertion failed: 0
[2017-01-16 10:52:09.126465] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc740) [0x7f866b734740] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28a70) [0x7f866b750a70] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3aa66) [0x7f866b762a66] ) 0-: Assertion failed: 0
[2017-01-16 10:52:09.126533] W [MSGID: 108019] [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-newdr-replicate-4: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000.
[2017-01-16 10:52:09.126614] I [MSGID: 108019] [afr-transaction.c:1829:afr_post_blocking_inodelk_cbk] 0-newdr-replicate-4: Blocking inodelks failed.
[2017-01-16 10:52:09.126758] W [MSGID: 112199] [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /run15370/Bonnie.15391.005 => (XID: 8f57191d, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)), FH: exportid 9e098339-67c4-4975-bdfa-7d5278a3aae8, gfid 61b4583d-b71a-4dfc-92ad-cb4dd7d0fe53, mountid 0e315778-0000-0000-0000-000000000000
[2017-01-16 10:52:11.821905] W [MSGID: 112032] [nfs3.c:3600:nfs3svc_rmdir_cbk] 0-nfs: a557191d: /run15370 => -1 (Directory not empty) [Directory not empty]

Actual results:
===============
Bonnie test suite failed with "Can't open file" error.

Expected results:
================
Bonnie test suite should complete without any errors/issues.



--- Additional comment from Jiffin on 2017-01-17 00:33:15 EST ---

The below log message filled only because of gfid for that location(/run15370/Bonnie.15391.005) is NULL
[2017-01-16 10:52:09.126007] E [client-common.c:844:client_pre_inodelk] (-->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0xc740) [0x7f866b734740] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x28a70) [0x7f866b750a70] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x3aa66) [0x7f866b762a66] ) 0-: Assertion failed: 0

Code snippet from client_pre_inodelk
----
        if (!(loc && loc->inode))
                goto out; 

        if (!gf_uuid_is_null (loc->gfid))
                memcpy (req->gfid,  loc->gfid, 16); 
        else
                memcpy (req->gfid, loc->inode->gfid, 16); 

        GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
                                      !gf_uuid_is_null (*((uuid_t *)req->gfid)),
                                      out, op_errno, EINVAL);
----

The condition for returning EINVAL is only when gfid is NULL and this information passed to upper layer which resulted in this failure.

[2017-01-16 10:52:09.126533] W [MSGID: 108019] [afr-lk-common.c:1090:is_blocking_locks_count_sufficient] 0-newdr-replicate-4: Unable to obtain blocking inode lock on even one child for gfid:00000000-0000-0000-0000-000000000000

[2017-01-16 10:52:09.126758] W [MSGID: 112199] [nfs3-
elpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /run15370/Bonnie.15391.005 => (XID: 8f57191d, CREATE: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)), FH: exportid 9e098339-67c4-4975-bdfa-7d5278a3aae8, gfid 61b4583d-b71a-4dfc-92ad-cb4dd7d0fe53, mountid 0e315778-0000-0000-0000-000000000000

Actually at the backend the file is already created with following details
getfattr -d -m "." -e hex Bonnie.15391.005
# file: Bonnie.15391.005
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x61b4583db71a4dfc92adcb4dd7d0fe53
trusted.pgfid.09388be3-eb94-4cb1-94b0-0518e00d3c90=0x00000001

Either this information is not updated correctly at nfs-server(gluserfs client)
If you look the logs just before it it failed to remove same file

[2017-01-16 10:52:09.067140] W [MSGID: 101159] [inode.c:1214:__inode_unlink] 0-inode: 09388be3-eb94-4cb1-94b0-0518e00d3c90/Bonnie.15391.005: dentry not found in 3f678a33-317b-4cde-a959-0e038cdc04ca

The gfid mentioned in this operation is "3f678a33-317b-4cde-a959-0e038cdc04ca"
different from the backend. 
So the following might have happened :
Bonnie initial creates a filed named Bonnie.15391.005 and removed it. But this information is not properly updated in the client stack. So Bonnie creates the same file it ends up using invalid gfid.

Thanks ravi for help in debugging.

--- Additional comment from Jiffin on 2017-01-17 07:14:02 EST ---

RCA : By code checking it seems issue resides in nfs3svc_create_cbk(), it calls nfs_setattr() with improper gfid which results in this failure.
Thanks Pranith for finding the issue and proposing the fix

Comment 1 Worker Ant 2017-01-17 13:19:01 UTC
REVIEW: http://review.gluster.org/16421 (gNFS : pass proper gfid for nfs_setattr() in nfs3svc_create_cbk()) posted (#1) for review on master by jiffin tony Thottan (jthottan)

Comment 2 Worker Ant 2017-01-30 11:51:59 UTC
COMMIT: https://review.gluster.org/16421 committed in master by Niels de Vos (ndevos) 
------
commit fbe70dcdd7c62ae97e04e09755dcf87d90fcad8b
Author: Jiffin Tony Thottan <jthottan>
Date:   Tue Jan 17 18:26:55 2017 +0530

    gNFS : pass proper gfid for nfs_setattr() in nfs3svc_create_cbk()
    
    The nfs3svc_create_cbk() takes "gfid" from inode variable which it
    received, not from inode which it performed linking to nfs_setattr().
    There is possiblity that the inode passed into this function holds
    NULL gfid if there are some parallel operation happening on the same
    file.
    
    Similar issue is mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1413971
    Thanks pranith for proposing the fix
    
    Change-Id: I1a0ff4f02b483416f19f4f064c306c2cad5d9d8b
    BUG: 1413971
    Signed-off-by: Jiffin Tony Thottan <jthottan>
    Reviewed-on: https://review.gluster.org/16421
    Reviewed-by: soumya k <skoduri>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>

Comment 3 Shyamsundar 2017-05-30 18:38:45 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/