+++ This bug was initially created as a clone of Bug #1569657 +++ Description of problem: Single volume mounted via 4 different VIP's on 4 clients (v3/v4).While running linux untar,dbench,iozone from 2 clients and parallel lookups from another 2 clients, lookups got failed on both the clients. Client on which lookup failed did following sequence Client 1: -> while true;do find . -mindepth 1 -type f;done -> while true;do ls -lRt;done Client 2: -> find command in loop Doing "ls" on the same mount point- [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# Able to create files and dirs on same mount- [root@dhcp47-33 mani-mount]# touch mani [root@dhcp47-33 mani-mount]# touch mani1 [root@dhcp47-33 mani-mount]# touch mani2 [root@dhcp47-33 mani-mount]# touch mani3 [root@dhcp47-33 mani-mount]# mkdir ms1 [root@dhcp47-33 mani-mount]# mkdir ms2 [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument Another client on which lookups failed- [root@dhcp46-20 mani-mount]# ^C [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument Unmounted and remounted the same volume on same client with same VIP.Issue still exist. Mounted the same volume on another client with same VIP.Again "ls" unable to list content Did "ls" from one of the client from which iozone was ongoing,able to get data- mani-mount]# ls dir1 f2 linux-4.9.5.tar.xz mani1 mani3 ms2 test f1 linux-4.9.5 mani mani2 ms1 run6396 test1 Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha nfs-ganesha-gluster-2.5.5-4.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64 nfs-ganesha-2.5.5-4.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1.Create 4 node ganesha cluster 2.Create 2 x (2 + 1) arbiter volume 3.Export the volume via ganesha 4.Mount the volume on 4 clients with 4 different VIP's. 2 clients with vers=3 and 2 clients with vers=4.0 5.Perform following data set- -> Client 1 (v3):Run dbench first.Post completion run iozone -> Client 2 (v4):lookups finds and ls -lRt in loop -> Client 3 (v3):lookups finds -> Client 4 (v4) :liux untars Actual results: Lookups got failed from both the clients performing lookups.No impact on ongoing IO's Expected results: lookups should not fail Additional info: Not able to find any error logs causing lookups to fail in ganesha-gfapi.logs. On all the 4 server node,ganesha is up and running # showmount -e Export list for dhcp37-120.lab.eng.blr.redhat.com: /Ganesha-lock (everyone) /mani-test1 (everyone) ------------------------------ ]# [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument A --- Additional comment from Jiffin on 2018-04-23 03:01:59 EDT --- Reason for error : after performing readdir call in, ganesha's mdcache(not gluster mdcahe) performs a getattr call on each entries of dirent list to refresh it's cache. When gettattr call reaches fsal_gluster, first it performs the glfs_h_stat, for directory in the root "ms2"(gfid : 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack returned EINVAL(I was not able to find any packets related to this gfid). I find only this server in ganesha cluster have that issue And I have lost setup in same state and didn't find layer returned EINVAL At back end : # getfattr -d -m "." -e hex ms2 # file: ms2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a trusted.glusterfs.dht=0x0000000000000000000000007ffffffe trusted.glusterfs.dht.mds=0x00000000 I found following messages in ganesha-gfapi.log : [2018-04-19 16:50:33.923678] E [MSGID: 101046] [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null [2018-04-19 16:51:37.606081] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0 [2018-04-19 16:53:07.867398] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0 [2018-04-19 16:55:28.636417] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0 [2018-04-19 16:59:05.204906] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0 [2018-04-19 17:07:47.896369] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0 [2018-04-19 17:10:27.273871] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0 [2018-04-19 17:10:41.758168] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0 Manisha : Since set up was not in same state, priority of this bug depends on how reproducible the issue is ? @Dang : Is it okay to skip the entry which failed "getattrs" in from directory list and continue with rest of entries instead of failing the entire readdir operation ? @Nithya : Have u encountered any similar issue with dht ? --- Additional comment from Manisha Saini on 2018-04-23 03:16:32 EDT --- (In reply to Jiffin from comment #3) > Reason for error : > after performing readdir call in, ganesha's mdcache(not gluster mdcahe) > performs a getattr call on each entries of dirent list to refresh it's > cache. When gettattr call reaches fsal_gluster, first it performs the > glfs_h_stat, for directory in the root "ms2"(gfid : > 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack > returned EINVAL(I was not able to find any packets related to this gfid). I > find only this server in ganesha cluster have that issue > And I have lost setup in same state and didn't find layer returned EINVAL > > At back end : > > # getfattr -d -m "." -e hex ms2 > # file: ms2 > security. > selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f7 > 43a733000 > trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a > trusted.glusterfs.dht=0x0000000000000000000000007ffffffe > trusted.glusterfs.dht.mds=0x00000000 > > I found following messages in ganesha-gfapi.log : > > [2018-04-19 16:50:33.923678] E [MSGID: 101046] > [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null > [2018-04-19 16:51:37.606081] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0 > [2018-04-19 16:53:07.867398] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0 > [2018-04-19 16:55:28.636417] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0 > [2018-04-19 16:59:05.204906] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0 > [2018-04-19 17:07:47.896369] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0 > [2018-04-19 17:10:27.273871] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0 > [2018-04-19 17:10:41.758168] I [MSGID: 109063] > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0 > > > Manisha : > > Since set up was not in same state, priority of this bug depends on how > reproducible the issue is ? Jiffin,when I shared the setup,to me it was in same state.Don't know how ganesha service got crashed.That also needs to be looked upon. Also as we are unable to get any files from mount point post performing "ls",to me it stands blocker. I will try to repro the issue.But considering the lack of qe bandwidth I will try to update the BZ by 26th April EOD Keeping needinfo intact > > @Dang : > > Is it okay to skip the entry which failed "getattrs" in from directory list > and continue with rest of entries instead of failing the entire readdir > operation ? > > @Nithya : > Have u encountered any similar issue with dht ? --- Additional comment from Susant Kumar Palai on 2018-04-23 05:48:00 EDT --- from gfapi log : [2018-04-19 16:18:17.188419] W [MSGID: 108001] [afr-common.c:5171:afr_notify] 0-mani-test1-replicate-0: Client-quorum is not met [2018-04-19 16:18:17.188877] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-3: disconnected from mani-test1-client-3. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.188955] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-4: disconnected from mani-test1-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.188976] W [MSGID: 108001] [afr-common.c:5171:afr_notify] 0-mani-test1-replicate-1: Client-quorum is not met [2018-04-19 16:18:17.188805] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-2: disconnected from mani-test1-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.189312] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-5: disconnected from mani-test1-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.189342] E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2018-04-19 16:18:17.190301] E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. The message "I [MSGID: 104043] [glfs-mgmt.c:628:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile, continuing" repeated 2 times between [2018-04-19 16:17:52.270376] and [2018-04-19 16:18:15.862041] [2018-04-19 16:35:23.081937] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 7cdfc3f6-4337-4a50-a58f-0305e65cb0c0). Holes=1 overlaps=0 [2018-04-19 16:38:39.796313] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = f4619ab7-cfc9-481c-91ad-a03fa096ecdc). Holes=1 overlaps=0 [2018-04-19 16:38:42.005686] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = bbb57d4e-b895-4437-82bb-04c9b45a991b). Holes=1 overlaps=0 [2018-04-19 16:39:22.441298] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = f6ac57d8-835d-4ad1-8eb2-2d970b14b312). Holes=1 overlaps=0 [2018-04-19 16:39:29.100457] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 29ee038c-9e56-4ff3-965a-e619d3c0eec3). Holes=1 overla Seems like the layout needed a heal and both the server went down. This will lead to lookup failure on root it self. Having the setup would have helped confirming the layout issue and further any client-server connection issue. In my opinion, either the bricks were killed or there was network partition and hence the problem. --- Additional comment from Daniel Gryniewicz on 2018-04-23 09:49:13 EDT --- MDCACHE doesn't do a getattrs. The attributes of the object referenced by the dirent are passed back to MDCACHE in the callback by the FSAL. FSAL_GLUSTER uses glfs_xreaddirplus_r() to get both the file handle and it's attributes, which are then passed back to MDCACHE. So no separate getattrs() should be called. That said, MDCACHE needs the attributes when it creates the object, so we can't just skip the dirent. --- Additional comment from Manisha Saini on 2018-04-24 06:00:05 EDT --- (In reply to Manisha Saini from comment #4) > (In reply to Jiffin from comment #3) > > Reason for error : > > after performing readdir call in, ganesha's mdcache(not gluster mdcahe) > > performs a getattr call on each entries of dirent list to refresh it's > > cache. When gettattr call reaches fsal_gluster, first it performs the > > glfs_h_stat, for directory in the root "ms2"(gfid : > > 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack > > returned EINVAL(I was not able to find any packets related to this gfid). I > > find only this server in ganesha cluster have that issue > > And I have lost setup in same state and didn't find layer returned EINVAL > > > > At back end : > > > > # getfattr -d -m "." -e hex ms2 > > # file: ms2 > > security. > > selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f7 > > 43a733000 > > trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a > > trusted.glusterfs.dht=0x0000000000000000000000007ffffffe > > trusted.glusterfs.dht.mds=0x00000000 > > > > I found following messages in ganesha-gfapi.log : > > > > [2018-04-19 16:50:33.923678] E [MSGID: 101046] > > [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null > > [2018-04-19 16:51:37.606081] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0 > > [2018-04-19 16:53:07.867398] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0 > > [2018-04-19 16:55:28.636417] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0 > > [2018-04-19 16:59:05.204906] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0 > > [2018-04-19 17:07:47.896369] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0 > > [2018-04-19 17:10:27.273871] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0 > > [2018-04-19 17:10:41.758168] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0 > > > > > > Manisha : > > > > Since set up was not in same state, priority of this bug depends on how > > reproducible the issue is ? > > Jiffin,when I shared the setup,to me it was in same state.Don't know how > ganesha service got crashed.That also needs to be looked upon. > Also as we are unable to get any files from mount point post performing > "ls",to me it stands blocker. > > I will try to repro the issue.But considering the lack of qe bandwidth I > will try to update the BZ by 26th April EOD > > > > > > @Dang : > > > > Is it okay to skip the entry which failed "getattrs" in from directory list > > and continue with rest of entries instead of failing the entire readdir > > operation ? > > > > @Nithya : > > Have u encountered any similar issue with dht ? (In reply to Manisha Saini from comment #4) > (In reply to Jiffin from comment #3) > > Reason for error : > > after performing readdir call in, ganesha's mdcache(not gluster mdcahe) > > performs a getattr call on each entries of dirent list to refresh it's > > cache. When gettattr call reaches fsal_gluster, first it performs the > > glfs_h_stat, for directory in the root "ms2"(gfid : > > 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack > > returned EINVAL(I was not able to find any packets related to this gfid). I > > find only this server in ganesha cluster have that issue > > And I have lost setup in same state and didn't find layer returned EINVAL > > > > At back end : > > > > # getfattr -d -m "." -e hex ms2 > > # file: ms2 > > security. > > selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f7 > > 43a733000 > > trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a > > trusted.glusterfs.dht=0x0000000000000000000000007ffffffe > > trusted.glusterfs.dht.mds=0x00000000 > > > > I found following messages in ganesha-gfapi.log : > > > > [2018-04-19 16:50:33.923678] E [MSGID: 101046] > > [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null > > [2018-04-19 16:51:37.606081] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0 > > [2018-04-19 16:53:07.867398] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0 > > [2018-04-19 16:55:28.636417] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0 > > [2018-04-19 16:59:05.204906] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0 > > [2018-04-19 17:07:47.896369] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0 > > [2018-04-19 17:10:27.273871] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0 > > [2018-04-19 17:10:41.758168] I [MSGID: 109063] > > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in > > (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0 > > > > > > Manisha : > > > > Since set up was not in same state, priority of this bug depends on how > > reproducible the issue is ? > > Jiffin,when I shared the setup,to me it was in same state.Don't know how > ganesha service got crashed.That also needs to be looked upon. > Also as we are unable to get any files from mount point post performing > "ls",to me it stands blocker. > > I will try to repro the issue.But considering the lack of qe bandwidth I > will try to update the BZ by 26th April EOD > > Keeping needinfo intact > > > > > @Dang : > > > > Is it okay to skip the entry which failed "getattrs" in from directory list > > and continue with rest of entries instead of failing the entire readdir > > operation ? > > > > @Nithya : > > Have u encountered any similar issue with dht ? --- Additional comment from Jiffin on 2018-06-06 02:59:38 EDT --- I tried to recreate similar on latest ganesha build(+ plus fix for 1580107) and I was not able to create the issue with bonnie + linux untar(for 2 hrs) (tried twice) on 2*(2+1) volume. Can you please retry the following in the new build. Please enable debugging logging for ganesha and gfapi , collect the packet on the server where "ls -ltr " is performed. Enable debug log for gfapi -- set diagnostics.client-log-level to DEBUG Enable debug for ganesha on following components readdir and cache inode add follow in ganesha.conf LOG { ## Default log level for all components #Default_Log_Level = WARN; ## Configure per-component log levels. Components { CACHE_INODE = FULL_DEBUG; CACHE_INODE_LRU = FULL_DEBUG; NFS_READDIR = FULL_DEBUG; } } Please restart nfs-ganesha post that --- Additional comment from Manisha Saini on 2018-06-21 14:54:00 EDT --- (In reply to Jiffin from comment #10) > I tried to recreate similar on latest ganesha build(+ plus fix for 1580107) > and I was not able to create the issue with bonnie + linux untar(for 2 hrs) > (tried twice) on 2*(2+1) volume. Can you please retry the following in the > new build. > Please enable debugging logging for ganesha and gfapi , collect the packet > on the server where "ls -ltr " is performed. > > Enable debug log for gfapi -- set diagnostics.client-log-level to DEBUG > > Enable debug for ganesha on following components readdir and cache inode > add follow in ganesha.conf > > LOG { > ## Default log level for all components > #Default_Log_Level = WARN; > > ## Configure per-component log levels. > Components { > CACHE_INODE = FULL_DEBUG; > CACHE_INODE_LRU = FULL_DEBUG; > NFS_READDIR = FULL_DEBUG; > } > > > > } > There are no logs generated on those Ganesha server nodes through which clients are mapped,performing lookups.The other nodes which are performing dbench and untars have the logs in place. Setup detail is same as in comment #17 The client on which lookup causing "invalid argument" dhcp47-170.lab.eng.blr.redhat.com - root/redhat [root@dhcp47-170 readdir_test]# ls ls: reading directory .: Invalid argument [root@dhcp47-170 readdir_test]# ls ls: reading directory .: Invalid argument [root@dhcp47-170 readdir_test]# ls ls: reading directory .: Invalid argument [root@dhcp47-170 readdir_test]# ls ls: reading directory .: Invalid argument
REVIEW: https://review.gluster.org/20598 (gfapi : Set need lookup in pub_glfs_h_create_handle) posted (#1) for review on master by jiffin tony Thottan
REVIEW: https://review.gluster.org/20643 (cluster/dht: Extra unref on inode in discover path) posted (#3) for review on master by Susant Palai
COMMIT: https://review.gluster.org/20643 committed in master by "Atin Mukherjee" <amukherj> with a commit message- cluster/dht: fix inode ref management in dht_heal_path In dht_heal_path, the inodes are created & looked up from top to down. If the path is "a/b/c", then lookup will be done on a, then b and so on. Here is a rough snippet of the function "dht_heal_path". <snippet> if (bname) { ref_count - loc.inode = create/grep inode 1 - syncop_lookup (loc.inode) - linked_inode = inode_link (loc.inode) 2 /*clean up current loc*/ - loc_wipe(&loc) 1 /*set up parent and bname for next child */ - loc.parent = inode - bname = next_child_name } out: - inode_ref (linked_inode) 2 - loc_wipe (&loc) 1 </snippet> The problem with the above code is if _bname_ is empty ie the chain lookup is done, then for the next iteration we populate loc.parent anyway. Now that bname is empty, the loc_wipe is done in the _out_ section as well. Since, the loc.parent was set to the previous inode, we lose a ref unwantedly. Now a dht_local_wipe as part of the DHT_STACK_UNWIND takes away the last ref leading to inode_destroy. This problenm is observed currently with nfs-ganesha with the nameless lookup. Post the inode_purge, gfapi does not get the new inode to link and hence, it links the inode it sent in the lookup fop, which does not have any dht related context (layout) leading to "invalid argument error" in lookup path done parallely with tar operation. test done in the following way: - create two nfs client connected with two different nfs servers. - run untar on one client and run lookup continuously on the other. - Prior to this patch, invalid arguement was seen which is fixed with the current patch. Change-Id: Ifb90c178a2f3c16604068c7da8fa562b877f5c61 fixes: bz#1610256 Signed-off-by: Susant Palai <spalai>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report. glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html [2] https://www.gluster.org/pipermail/gluster-users/