Created attachment 1042591 [details] pynfs results Description of problem: I created a new and used it's default configuration i.e. ACLs and Root_squash disabled. Then I executed pynfs and found that there are 101 failure and there was time out error. [root@rhsauto048 ~]# tail -f /export/pynfs-results.log timeout: timed out WRT15 st_write.testSizes : FAILURE timeout: timed out ************************************************** Command line asked for 572 of 655 tests Of those: 185 Skipped, 101 Failed, 1 Warned, 285 Passed Version-Release number of selected component (if applicable): glusterfs-3.7.1-4.el6rhs.x86_64 nfs-ganesha-2.2.0-3.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. create a volume of 6x2 type, start it 2. configure nfs-ganesha. 3. mount the volume on client with vers=4 4. execute pynfs test suite Actual results: 101 tests failed Expected results: these many failure are not expected and we need to avoid getting large number of failures. Additional info: Attaching the result file and that may help us investigating the failure cases.
post pynfs failures I see that the mount is hung as the ls to that mount point is in "D" state, [root@rhsauto048 ~]# ps -auxww | grep ls Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ root 3964 0.0 0.0 115692 812 pts/3 D+ 10:03 0:00 ls --color=auto /mnt root 3970 0.0 0.0 103252 808 pts/2 S+ 10:04 0:00 grep ls interesting fact is that the ganesha is running, [root@nfs11 ~]# ps -eaf | grep ganesha root 8155 5930 0 15:42 pts/0 00:00:00 grep ganesha root 22032 1 85 00:45 ? 12:44:50 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid You have new mail in /var/spool/mail/root [root@nfs11 ~]# [root@nfs11 ~]# but showmount is also hung, [root@nfs11 ~]# showmount -e localhost
Created attachment 1042592 [details] nfs11 ganesha.log
Created attachment 1042593 [details] nfs11 messages
I was able to reproduce the issue as reported by Saurabh. But when I ran the tests individually, most of the tests passed successfully. The functionality works just fine. When all of them are together, I come across this issue. Needs more debugging.
Are we sure that this is not a pynfs issue itself? We do need to get the RCA for this urgently, this could be an indication of a broader issue.
Yes. I ran all the tests individually and most of them pass. I have been to RCA it but the point of failure is quite inconsistent. Even though all the tests separate directories, when these tests are run together, I am able to consistently reproduce this issue at different points and different error messages in the logs.
And yes, it could be an indication of a broader issue. I am trying to narrow down the test case by increasing the number of NFS-Ganesha threads.
When I ran pynfs on nfsv4 in which gluster compiled in enable-debug mode, ganesha crashed due to assertion failures in inode_path function called from glfs_h_stat(). It seems to be glfs_objhandle(both gfid + inode) is not filled properly for given entry.It first hit on OPENDOWNGRADE test when i ran entire test suite completely. Till that OPENDOWNGRADE only two tests were failed which are acl related. So when i enable acl and then run the same those tests were passed but same issue was hit before OPENDOWNGRADE
Initial RCA seems to point that, once we hit few failures while running the pynfs tests, most of the subsequent tests fails because of either accessing the same invalid files which got created or nfs-ganesha being in inconsistent state when it hit those failures. Thats the reason when the tests are ran individually, most of the tests have succeeded, but not as a whole. Only once we fix those failures, we shall be able to get this test-suite run successfully. Also note that there are multiple issues here. 1) as jiffin has earlier pointed out, if Gluster is configured in debug mode, we are able to hit the below assert consistently during OPEN_DOWNGRADE test (OPDG10) (gdb) bt #0 0x0000003908c32625 in raise () from /lib64/libc.so.6 #1 0x0000003908c33e05 in abort () from /lib64/libc.so.6 #2 0x0000003908c2b74e in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003908c2b810 in __assert_fail () from /lib64/libc.so.6 #4 0x00007fe544940fdf in __inode_path (inode=0x7fe4f023ca8c, name=0x0, bufp=0x7fe504aeb600) at inode.c:1330 #5 0x00007fe5449413d1 in inode_path (inode=0x7fe4f023ca8c, name=0x0, bufp=0x7fe504aeb600) at inode.c:1431 #6 0x00007fe5449135ac in loc_touchup (loc=0x7fe504aeb6e0, name=0x0) at xlator.c:780 #7 0x00007fe544be774c in priv_glfs_loc_touchup (loc=0x7fe504aeb6e0) at glfs-resolve.c:159 #8 0x00007fe544bea674 in pub_glfs_h_stat (fs=0xcd6500, object=0x7fe4f03512e0, stat=0x7fe504aeb778) at glfs-handleops.c:221 #9 0x00007fe544dfbdfb in getattrs (obj_hdl=0x7fe4f025dd28) at /root/git-ganesha/src/FSAL/FSAL_GLUSTER/handle.c:671 #10 0x00000000004eaf71 in cache_inode_refresh_attrs (entry=0x7fe4f0211700) at /root/git-ganesha/src/include/cache_inode.h:967 #11 0x00000000004ee16f in cache_inode_lock_trust_attrs (entry=0x7fe4f0211700, need_wr_lock=true) at /root/git-ganesha/src/cache_inode/cache_inode_misc.c:861 #12 0x00000000004e9fb0 in cache_inode_setattr (entry=0x7fe4f0211700, attr=0x7fe504aeca50, is_open_write=true) at /root/git-ganesha/src/cache_inode/cache_inode_setattr.c:95 #13 0x0000000000473817 in open4_create (arg=0x7fe4f0300fc8, data=0x7fe504aecd40, res=0x7fe4f0335578, parent=0x7fe4f01877d0, entry=0x7fe504aecc40, filename=0x7fe4f029dc30 "OPEN2", created=0x7fe504aecc4e) at /root/git-ganesha/src/Protocols/NFS/nfs4_op_open.c:801 #14 0x000000000047393f in open4_claim_null (arg=0x7fe4f0300fc8, data=0x7fe504aecd40, res=0x7fe4f0335578, entry=0x7fe504aecc40, created=0x7fe504aecc4e) at /root/git-ganesha/src/Protocols/NFS/nfs4_op_open.c:866 #15 0x000000000047508c in nfs4_op_open (op=0x7fe4f0300fc0, data=0x7fe504aecd40, resp=0x7fe4f0335570) at /root/git-ganesha/src/Protocols/NFS/nfs4_op_open.c:1346 #16 0x0000000000460141 in nfs4_Compound (arg=0x7fe4f02a5048, worker=0x7fe4f00008c0, req=0x7fe4f02a4f08, res=0x7fe4f0289480) at /root/git-ganesha/src/Protocols/NFS/nfs4_Compound.c:712 #17 0x0000000000454b4c in nfs_rpc_execute (req=0x7fe4f02aaf80, worker_data=0x7fe4f00008c0) at /root/git-ganesha/src/MainNFSD/nfs_worker_thread.c:1268 #18 0x00000000004558e6 in worker_run (ctx=0xe0c6f0) at /root/git-ganesha/src/MainNFSD/nfs_worker_thread.c:1535 #19 0x000000000051cc86 in fridgethr_start_routine (arg=0xe0c6f0) at /root/git-ganesha/src/support/fridgethr.c:562 #20 0x00000039090079d1 in start_thread () from /lib64/libpthread.so.0 #21 0x0000003908ce88fd in clone () from /lib64/libc.so.6 (gdb) f 4 #4 0x00007fe544940fdf in __inode_path (inode=0x7fe4f023ca8c, name=0x0, bufp=0x7fe504aeb600) at inode.c:1330 1330 GF_ASSERT (0); (gdb) l 1325 int64_t ret = 0; 1326 int len = 0; 1327 char *buf = NULL; 1328 1329 if (!inode || gf_uuid_is_null (inode->gfid)) { 1330 GF_ASSERT (0); 1331 gf_msg_callingfn (THIS->name, GF_LOG_WARNING, EINVAL, 1332 LG_MSG_INVALID_ARG, "invalid inode"); 1333 return -EINVAL; 1334 } (gdb) p inode $1 = (inode_t *) 0x7fe4f023ca8c (gdb) p *inode $2 = {table = 0x7fe5300065f0, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, fd_count = 0, ref = 3, ia_type = IA_INVAL, fd_list = {next = 0x7fe4f023cac4, prev = 0x7fe4f023cac4}, dentry_list = { next = 0x7fe4f023cad4, prev = 0x7fe4f023cad4}, hash = {next = 0x7fe4f023cae4, prev = 0x7fe4f023cae4}, list = {next = 0xe30cf4, prev = 0x7fe530006650}, _ctx = 0x7fe4f028de80} (gdb) Here we are trying to get stat on an invalid inode. On further debugging found that, (gdb) c Continuing. [Switching to Thread 0x7f7fdebfd700 (LWP 22022)] Breakpoint 1, pub_glfs_h_open (fs=0x13e6500, object=0x7f7fc0004610, flags=2) at glfs-handleops.c:582 582 int ret = -1; Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-11.el6.x86_64 (gdb) c Continuing. [Switching to Thread 0x7f7fd7fff700 (LWP 22026)] Breakpoint 1, pub_glfs_h_open (fs=0x13e6500, object=0x15211e0, flags=2) at glfs-handleops.c:582 582 int ret = -1; (gdb) c Continuing. [Switching to Thread 0x7f7fdd7fb700 (LWP 22024)] Breakpoint 1, pub_glfs_h_open (fs=0x13e6500, object=0x7f7fb8001ed0, flags=0) at glfs-handleops.c:582 582 int ret = -1; (gdb) b pub_glfs_h_creat Breakpoint 2 at 0x7f8016d1c68a: file glfs-handleops.c, line 676. (gdb) c Continuing. [Switching to Thread 0x7f7fde1fc700 (LWP 22023)] Breakpoint 2, pub_glfs_h_creat (fs=0x13e6500, parent=0x7f7fc8002e40, path=0x7f7fc41840d0 "OPDG10", flags=64, mode=420, stat=0x7f7fde1fa770) at glfs-handleops.c:676 676 int ret = -1; (gdb) n 677 struct glfs_fd *glfd = NULL; (gdb) 678 xlator_t *subvol = NULL; (gdb) 679 inode_t *inode = NULL; (gdb) 680 loc_t loc = {0, }; (gdb) 681 struct iatt iatt = {0, }; (gdb) 683 dict_t *xattr_req = NULL; (gdb) 684 struct glfs_object *object = NULL; (gdb) 687 if ((fs == NULL) || (parent == NULL) || (path == NULL)) { (gdb) 692 DECLARE_OLD_THIS; (gdb) 693 __GLFS_ENTRY_VALIDATE_FS (fs, invalid_fs); (gdb) 696 subvol = glfs_active_subvol (fs); (gdb) 697 if (!subvol) { (gdb) 704 inode = glfs_resolve_inode (fs, subvol, parent); (gdb) 705 if (!inode) { (gdb) 710 xattr_req = dict_new (); (gdb) 711 if (!xattr_req) { (gdb) 717 gf_uuid_generate (gfid); (gdb) 718 ret = dict_set_static_bin (xattr_req, "gfid-req", gfid, 16); (gdb) 719 if (ret) { (gdb) 725 GLFS_LOC_FILL_PINODE (inode, loc, ret, errno, out, path); (gdb) 727 glfd = glfs_fd_new (fs); (gdb) 728 if (!glfd) { (gdb) 734 glfd->fd = fd_create (loc.inode, getpid()); (gdb) 735 if (!glfd->fd) { (gdb) p *loc Structure has no component named operator*. (gdb) p loc $1 = {path = 0x7f7fc4183530 "/tmp/OPDG10", name = 0x7f7fc4183535 "OPDG10", inode = 0x7f7fc4185d0c, parent = 0x7f7fc80036ec, gfid = '\000' <repeats 15 times>, pargfid = "\263aU\325\031\000I\032\235q&\306\066\265>\205"} (gdb) p *loc->inode $2 = {table = 0x7f8000030b50, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, fd_count = 0, ref = 2, ia_type = IA_INVAL, fd_list = {next = 0x7f7fc4185d44, prev = 0x7f7fc4185d44}, dentry_list = { next = 0x7f7fc4185d54, prev = 0x7f7fc4185d54}, hash = {next = 0x7f7fc4185d64, prev = 0x7f7fc4185d64}, list = {next = 0x7f7fb8004594, prev = 0x7f8000030bb0}, _ctx = 0x7f7fc41833c0} (gdb) n 740 glfd->fd->flags = flags; (gdb) 743 ret = syncop_create (subvol, &loc, flags, mode, glfd->fd, (gdb) [New Thread 0x7f7fbf5fe700 (LWP 22049)] 745 DECODE_SYNCOP_ERR (ret); (gdb) 748 if (ret == 0) { (gdb) 754 ret = glfs_loc_link (&loc, &iatt); (gdb) s glfs_loc_link (loc=0x7f7fde1fa680, iatt=0x7f7fde1fa610) at glfs-fops.c:78 78 int ret = -1; (gdb) n 79 inode_t *linked_inode = NULL; (gdb) 81 if (!loc->inode) { (gdb) 86 linked_inode = inode_link (loc->inode, loc->parent, loc->name, iatt); (gdb) s inode_link (inode=0x7f7fc4185d0c, parent=0x7f7fc80036ec, name=0x7f7fc4183535 "OPDG10", iatt=0x7f7fde1fa610) at inode.c:1034 1034 inode_table_t *table = NULL; (gdb) n 1035 inode_t *linked_inode = NULL; (gdb) 1037 if (!inode) { (gdb) 1043 table = inode->table; (gdb) 1045 pthread_mutex_lock (&table->lock); (gdb) 1047 linked_inode = __inode_link (inode, parent, name, iatt); (gdb) s __inode_link (inode=0x7f7fc4185d0c, parent=0x7f7fc80036ec, name=0x7f7fc4183535 "OPDG10", iatt=0x7f7fde1fa610) at inode.c:926 926 dentry_t *dentry = NULL; (gdb) n 927 dentry_t *old_dentry = NULL; (gdb) 928 inode_t *old_inode = NULL; (gdb) 929 inode_table_t *table = NULL; (gdb) 930 inode_t *link_inode = NULL; (gdb) 932 if (!inode) (gdb) 935 table = inode->table; (gdb) 936 if (!table) (gdb) 939 if (parent) { (gdb) 943 if (inode->table != parent->table) { (gdb) 947 if (parent->ia_type != IA_IFDIR) { (gdb) 952 if (!name || strlen (name) == 0) { (gdb) 959 link_inode = inode; (gdb) 961 if (!__is_inode_hashed (inode)) { (gdb) 962 if (!iatt) (gdb) 965 if (gf_uuid_is_null (iatt->ia_gfid)) (gdb) 968 old_inode = __inode_find (table, iatt->ia_gfid); (gdb) 970 if (old_inode) { (gdb) 971 link_inode = old_inode; (gdb) 990 if (name) { (gdb) 991 if (!strcmp(name, ".") || !strcmp(name, "..")) (gdb) [New Thread 0x7f7fbffff700 (LWP 22048)] [New Thread 0x7f7fbebfd700 (LWP 22050)] 994 if (strchr (name, '/')) { (gdb) [New Thread 0x7f7fbe1fc700 (LWP 22051)] 1001 if (parent) { (gdb) 1002 old_dentry = __dentry_grep (table, parent, name); (gdb) 1004 if (!old_dentry || old_dentry->inode != link_inode) { (gdb) 1026 return link_inode; (gdb) p *link_inode $3 = {table = 0x7f8000030b50, gfid = "{/\033\370\063\326N\035\231^\210dN\033P\254", lock = 1, nlookup = 1, fd_count = 1, ref = 3, ia_type = IA_IFREG, fd_list = {next = 0x7f7fb8002ecc, prev = 0x7f7fb8002ecc}, dentry_list = {next = 0x7f7fb8002a2c, prev = 0x7f7fb8002a2c}, hash = {next = 0x7f80062b0b10, prev = 0x7f80062b0b10}, list = {next = 0x1520d64, prev = 0x7f7fc4185d74}, _ctx = 0x7f7fb8004600} (gdb) n 1027 } (gdb) inode_link (inode=0x7f7fc4185d0c, parent=0x7f7fc80036ec, name=0x7f7fc4183535 "OPDG10", iatt=0x7f7fde1fa610) at inode.c:1049 1049 if (linked_inode) (gdb) 1050 __inode_ref (linked_inode); (gdb) 1052 pthread_mutex_unlock (&table->lock); (gdb) 1054 inode_table_prune (table); (gdb) 1056 return linked_inode; (gdb) 1057 } (gdb) glfs_loc_link (loc=0x7f7fde1fa680, iatt=0x7f7fde1fa610) at glfs-fops.c:87 87 if (linked_inode) { (gdb) 88 inode_lookup (linked_inode); (gdb) 89 inode_unref (linked_inode); (gdb) 90 ret = 0; (gdb) 96 return ret; (gdb) p *loc->inode $4 = {table = 0x7f8000030b50, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, fd_count = 0, ref = 2, ia_type = IA_INVAL, fd_list = {next = 0x7f7fc4185d44, prev = 0x7f7fc4185d44}, dentry_list = { next = 0x7f7fc4185d54, prev = 0x7f7fc4185d54}, hash = {next = 0x7f7fc4185d64, prev = 0x7f7fc4185d64}, list = {next = 0x7f7fb8004594, prev = 0x7f8000030bb0}, _ctx = 0x7f7fc41833c0} (gdb) n 97 } (gdb) pub_glfs_h_creat (fs=0x13e6500, parent=0x7f7fc8002e40, path=0x7f7fc41840d0 "OPDG10", flags=64, mode=420, stat=0x7f7fde1fa770) at glfs-handleops.c:755 755 if (ret != 0) { (gdb) 759 if (stat) (gdb) 760 glfs_iatt_to_stat (fs, &iatt, stat); (gdb) 762 ret = glfs_create_object (&loc, &object); (gdb) 765 glfd->fd->flags = flags; (gdb) p *object->inode $5 = {table = 0x7f8000030b50, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, fd_count = 0, ref = 2, ia_type = IA_INVAL, fd_list = {next = 0x7f7fc4185d44, prev = 0x7f7fc4185d44}, dentry_list = { next = 0x7f7fc4185d54, prev = 0x7f7fc4185d54}, hash = {next = 0x7f7fc4185d64, prev = 0x7f7fc4185d64}, list = {next = 0x7f7fb8004594, prev = 0x7f8000030bb0}, _ctx = 0x7f7fc41833c0} (gdb) OPDG10 test tries to create same file twice, During 2nd 'glfs_h_creat', /* populate out args */ if (ret == 0) { /* TODO: If the inode existed in the cache (say file already exists), then the glfs_loc_link will not update the loc.inode, as a result we will have a 0000 GFID that we would copy out to the object, this needs to be fixed. */ ret = glfs_loc_link (&loc, &iatt); if (ret != 0) { goto out; } if (stat) glfs_iatt_to_stat (fs, &iatt, stat); ret = glfs_create_object (&loc, &object); } As the comment explains, when we try to create the same file, loc->inode doesn't get populated with the existing inode. Instead we fill in the same invalid inode in the 'object' variable which is passed to ganesha. This object when referenced again will result in the crash. Will send in a fix for this for review upstream.
Posted the fix for the above issue upstream -http://review.gluster.org/#/c/11572 With the fix, now I do not see the above crash. The tests went beyond but got stuck at a later stage -- RPLY1 st_replay.testOpen : RUNNING Sleeping for 0.3 seconds: Woke up Sleeping for 0.3 seconds: Woke up RPLY1 st_replay.testOpen : PASS RPLY2 st_replay.testReplayState1 : RUNNING Sleeping for 0.3 seconds: Woke up Sleeping for 0.3 seconds: Woke up RPLY2 st_replay.testReplayState1 : PASS RPLY3 st_replay.testReplayState2 : RUNNING Sleeping for 0.3 seconds: Woke up Sleeping for 0.3 seconds: Woke up RPLY3 st_replay.testReplayState2 : PASS RPLY4 st_replay.testReplayNonState : RUNNIN nfs-ganesha ran into a different issue and crashed. .14.el6_5.x86_64 openssl-1.0.1e-16.el6_5.7.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x0000003983e32925 in raise () from /lib64/libc.so.6 #1 0x0000003983e34105 in abort () from /lib64/libc.so.6 #2 0x00007fdbecbe2cd8 in glusterfs_unload () at /home/skoduri/nfs-ganesha-new/src/FSAL/FSAL_GLUSTER/main.c:164 #3 0x00007fdbecbe269f in __do_global_dtors_aux () from /usr/lib64/ganesha/libfsalgluster.so #4 0x0000000000000000 in ?? () (gdb) From the logs, /07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] state_nfs4_owner_unlock_all :STATE :CRIT :state_unlock failed STATE_INVALID_ARGUMENT 08/07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] do_unlock_no_owner :STATE :MAJ :Error STATE_INVALID_ARGUMENT while trying to do FSAL Unlock 08/07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] state_unlock :STATE :MAJ :Unable to unlock FSAL, error=STATE_INVALID_ARGUMENT 08/07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] state_nfs4_owner_unlock_all :STATE :CRIT :state_unlock failed STATE_INVALID_ARGUMENT 08/07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] state_nfs4_owner_unlock_all :STATE :FATAL :Could not complete cleanup of lock state for lock owner STATE_LOCK_OWNER_NFSV4 0x7fdbb00a3520: clientid={0x7fdbb016da00 ClientID={Epoch=0x559cc941 Counter=0x00000419} EXPIRED Client={0x7fdba81c32b0 name=(16:client1_pid12461) refcount=3} t_delta=66 reservations=0 refcount=22 cb_prog=1073754169 r_addr=0.0.0.0.0.0 r_netid=tcp} owner=(27:lockowner_1436339223.550621) confirmed=0 seqid=1 related_owner={STATE_OPEN_OWNER_NFSV4 0x7fdba8179690: clientid={0x7fdbb016da00 ClientID={Epoch=0x559cc941 Counter=0x00000419} EXPIRED Client={0x7fdba81c32b0 name=(16:client1_pid12461) refcount=3} t_delta=66 reservations=0 refcount=22 cb_prog=1073754169 r_addr=0.0.0.0.0.0 r_netid=tcp} owner=(5:RPLY6) confirmed=1 seqid=3 refcount=3} refcount=2 08/07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] unregister_fsal :FSAL :CRIT :Unregister FSAL GLUSTER with non-zero refcount=1 08/07/2015 12:38:10 : epoch 559cc941 : 192.168.122.205 : nfs-ganesha-12418[reaper] glusterfs_unload :FSAL :CRIT :FSAL Gluster unable to unload. Dying ... There was an issue with unlock and it resulted in nfs-ganesha unloading FSAL_GLUSTER.
Further debugging on the 2nd issue found --> tmp-brick1.log:[2015-07-08 07:08:10.408657] E [MSGID: 115051] [server-rpc-fops.c:188:server_lk_cbk] 0-vol1-server: 7048: LK 53 (f4a5b8c2-3738-490d-b28b-0cf45eb06cf7) ==> (Invalid argument) [Invalid argument] tmp-brick1.log:[2015-07-08 07:08:10.410407] E [MSGID: 115051] [server-rpc-fops.c:188:server_lk_cbk] 0-vol1-server: 7049: LK 53 (f4a5b8c2-3738-490d-b28b-0cf45eb06cf7) ==> (Invalid argument) [Invalid argument] After adding some additional debug statements in the code, found that in int pl_lk (call_frame_t *frame, xlator_t *this, fd_t *fd, int32_t cmd, struct gf_flock *flock, dict_t *xdata) { ...... ......... if ((flock->l_start < 0) || (flock->l_len < 0)) { op_ret = -1; op_errno = EINVAL; goto unwind; } ....... ....... } nfs-ganesha sends unlock on a lock request with 'flock->l_start = 0' and 'flock->l_len = -10'. As per 'man 3 fcntl', its valid to have negative values as l_len as long as l_start+l_len > 0. This needs to be fixed. Will send in a fix. But also investigating on why nfs-ganesha is sending negative length values.
Found an issue on nfs-ganesha side too , static state_status_t subtract_lock_from_entry(cache_entry_t *entry, state_lock_entry_t *found_entry, fsal_lock_param_t *lock, struct glist_head *split_list, struct glist_head *remove_list, bool *removed) { uint64_t found_entry_end = lock_end(&found_entry->sle_lock); uint64_t range_end = lock_end(lock); ....... ........ if (range_end < found_entry_end) { found_entry_right = state_lock_entry_t_dup(found_entry); if (found_entry_right == NULL) { free_list(split_list); *removed = false; status = STATE_MALLOC_ERROR; return status; } found_entry_right->sle_lock.lock_start = range_end + 1; >>>> found_entry_right->sle_lock.lock_length = found_entry_end - range_end; LogEntry("Right split", found_entry_right); glist_add_tail(split_list, &(found_entry_right->sle_list)); } ... If '&found_entry->sle_lock.lock_length' is zero, found_entry_end gets UINT64_MAX value. And sometimes it may lead to 'found_entry_right->sle_lock.lock_length' being assigned larger values instead of '0' to point to end of the file. Need to discuss with nfs-ganesha community on this issue.
Posted below fix for review which addresses above nfs-ganesha issue - https://review.gerrithub.io/#/c/239292 With this change, we do not need bug1241104 fix. That said, with the above posted fixes, (nfs-ganesha: https://review.gerrithub.io/#/c/239292 gluster: http://review.gluster.org/#/c/11572) I was able to run pynfs test-suite successfully. However there are failures reported for 6 testcases. OPEN2 st_open.testCreateUncheckedFile : FAILURE Attrs on recreate should be ignored: For size expected 32, got 16 OPEN3 st_open.testCreatGuardedFile : FAILURE Trying to do guarded recreate of file OPEN3 should return NFS4ERR_EXIST, instead got NFS4ERR_SHARE_DENIED OPEN4 st_open.testCreatExclusiveFile : FAILURE Trying to do exclusive recreate of file OPEN4 should return NFS4ERR_EXIST, instead got NFS4_OK ACL5 st_acl.testACL : FAILURE OP_SETATTR should return NFS4_OK, instead got NFS4ERR_NOTSUPP ACL10 st_acl.testLargeACL : FAILURE OP_SETATTR should return NFS4_OK, instead got NFS4ERR_NOTSUPP SEC7 st_secinfo.testRPCSEC_GSS : FAILURE SECINFO returned mechanism list without RPCSEC_GSS ACLs are disabled by default and SEC7 testcase requires KRB5 which is not configured by default. Will look into the OPEN{2,3,4} failures.
Created attachment 1050219 [details] pynfs-with-all-fixes
Posted another fix for OPEN* testcase failures - https://review.gerrithub.io/#/c/239332/ With all these fixes, all the pynfs-tests passed successfully (attached the results - 'pynfs-with-all-fixes-log' except for ACL5, ACL10, SEC7 which are expected as mentioned above.
When ACLs are enabled, the below mentioned failures are reported - RNM13 st_rename.testDirToDir : RUNNING RNM13 st_rename.testDirToDir : FAILURE RENAME dir1 into existing, empty dir2 should return NFS4_OK, instead got NFS4ERR_SERVERFAULT RNM15 st_rename.testFileToFile : RUNNING RNM15 st_rename.testFileToFile : FAILURE RENAME file1 into existing file2 should return NFS4_OK, instead got NFS4ERR_SERVERFAULT As this bug mainly reports about pynfs failures with ACLs disabled, we agreed to address this issue separately as part of bug1242148.
based on the test that I have done with acls disabled and enabled, I am moving this BZ to verified. There is one that is found is with MKSOCK failing and for that new BZ has been opened, BZ 1242412.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html