Description of problem: Single volume mounted via 4 different VIP's on 4 clients (v3/v4).While running linux untar,dbench,iozone from 2 clients and parallel lookups from another 2 clients, lookups got failed on both the clients. Client on which lookup failed did following sequence Client 1: -> while true;do find . -mindepth 1 -type f;done -> while true;do ls -lRt;done Client 2: -> find command in loop Doing "ls" on the same mount point- [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# Able to create files and dirs on same mount- [root@dhcp47-33 mani-mount]# touch mani [root@dhcp47-33 mani-mount]# touch mani1 [root@dhcp47-33 mani-mount]# touch mani2 [root@dhcp47-33 mani-mount]# touch mani3 [root@dhcp47-33 mani-mount]# mkdir ms1 [root@dhcp47-33 mani-mount]# mkdir ms2 [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument Another client on which lookups failed- [root@dhcp46-20 mani-mount]# ^C [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 mani-mount]# ls ls: reading directory .: Invalid argument Unmounted and remounted the same volume on same client with same VIP.Issue still exist. Mounted the same volume on another client with same VIP.Again "ls" unable to list content Did "ls" from one of the client from which iozone was ongoing,able to get data- mani-mount]# ls dir1 f2 linux-4.9.5.tar.xz mani1 mani3 ms2 test f1 linux-4.9.5 mani mani2 ms1 run6396 test1 Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha nfs-ganesha-gluster-2.5.5-4.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64 nfs-ganesha-2.5.5-4.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1.Create 4 node ganesha cluster 2.Create 2 x (2 + 1) arbiter volume 3.Export the volume via ganesha 4.Mount the volume on 4 clients with 4 different VIP's. 2 clients with vers=3 and 2 clients with vers=4.0 5.Perform following data set- -> Client 1 (v3):Run dbench first.Post completion run iozone -> Client 2 (v4):lookups finds and ls -lRt in loop -> Client 3 (v3):lookups finds -> Client 4 (v4) :liux untars Actual results: Lookups got failed from both the clients performing lookups.No impact on ongoing IO's Expected results: lookups should not fail Additional info: Not able to find any error logs causing lookups to fail in ganesha-gfapi.logs. On all the 4 server node,ganesha is up and running # showmount -e Export list for dhcp37-120.lab.eng.blr.redhat.com: /Ganesha-lock (everyone) /mani-test1 (everyone) ------------------------------ ]# pcs status Cluster name: ganesha-ha-360 Stack: corosync Current DC: dhcp37-213.lab.eng.blr.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum Last updated: Thu Apr 19 22:55:06 2018 Last change: Thu Apr 19 21:39:52 2018 by root via crm_attribute on dhcp37-99.lab.eng.blr.redhat.com 4 nodes configured 24 resources configured Online: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ] Resource Group: dhcp37-99.lab.eng.blr.redhat.com-group dhcp37-99.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-99.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-99.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-99.lab.eng.blr.redhat.com Resource Group: dhcp37-120.lab.eng.blr.redhat.com-group dhcp37-120.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-120.lab.eng.blr.redhat.com dhcp37-120.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-120.lab.eng.blr.redhat.com dhcp37-120.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-120.lab.eng.blr.redhat.com Resource Group: dhcp37-90.lab.eng.blr.redhat.com-group dhcp37-90.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-90.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-90.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-90.lab.eng.blr.redhat.com Resource Group: dhcp37-213.lab.eng.blr.redhat.com-group dhcp37-213.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp37-213.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-213.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp37-213.lab.eng.blr.redhat.com Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ------------------------ Client on which lookup got failed- # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/rhel-root 17811456 6610196 11201260 38% / devtmpfs 3943584 0 3943584 0% /dev tmpfs 3955628 0 3955628 0% /dev/shm tmpfs 3955628 33492 3922136 1% /run tmpfs 3955628 0 3955628 0% /sys/fs/cgroup /dev/sda1 1038336 219624 818712 22% /boot tmpfs 791128 0 791128 0% /run/user/0 10.70.37.19:/mani-test1 104273920 2319360 101954560 3% /mnt/mani-mount [root@dhcp47-33 mani-mount]# cd /mnt/mani-mount/ [root@dhcp47-33 mani-mount]# touch f1 [root@dhcp47-33 mani-mount]# ls ls: reading directory .: Invalid argument Attaching sosreports and tcpdumps shortly
from gfapi log : [2018-04-19 16:18:17.188419] W [MSGID: 108001] [afr-common.c:5171:afr_notify] 0-mani-test1-replicate-0: Client-quorum is not met [2018-04-19 16:18:17.188877] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-3: disconnected from mani-test1-client-3. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.188955] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-4: disconnected from mani-test1-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.188976] W [MSGID: 108001] [afr-common.c:5171:afr_notify] 0-mani-test1-replicate-1: Client-quorum is not met [2018-04-19 16:18:17.188805] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-2: disconnected from mani-test1-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.189312] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-5: disconnected from mani-test1-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2018-04-19 16:18:17.189342] E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2018-04-19 16:18:17.190301] E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. The message "I [MSGID: 104043] [glfs-mgmt.c:628:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile, continuing" repeated 2 times between [2018-04-19 16:17:52.270376] and [2018-04-19 16:18:15.862041] [2018-04-19 16:35:23.081937] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 7cdfc3f6-4337-4a50-a58f-0305e65cb0c0). Holes=1 overlaps=0 [2018-04-19 16:38:39.796313] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = f4619ab7-cfc9-481c-91ad-a03fa096ecdc). Holes=1 overlaps=0 [2018-04-19 16:38:42.005686] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = bbb57d4e-b895-4437-82bb-04c9b45a991b). Holes=1 overlaps=0 [2018-04-19 16:39:22.441298] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = f6ac57d8-835d-4ad1-8eb2-2d970b14b312). Holes=1 overlaps=0 [2018-04-19 16:39:29.100457] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 29ee038c-9e56-4ff3-965a-e619d3c0eec3). Holes=1 overla Seems like the layout needed a heal and both the server went down. This will lead to lookup failure on root it self. Having the setup would have helped confirming the layout issue and further any client-server connection issue. In my opinion, either the bricks were killed or there was network partition and hence the problem.
MDCACHE doesn't do a getattrs. The attributes of the object referenced by the dirent are passed back to MDCACHE in the callback by the FSAL. FSAL_GLUSTER uses glfs_xreaddirplus_r() to get both the file handle and it's attributes, which are then passed back to MDCACHE. So no separate getattrs() should be called. That said, MDCACHE needs the attributes when it creates the object, so we can't just skip the dirent.
I am in process of testing this use case again. Will update the bug with results by tomorrow morning.
So, it appears to be working now: [root@dhcp46-20 ganesha]# ls linux-4.9.5 linux-4.9.5.tar.xz run23992 run24219 [root@dhcp46-20 ganesha]# This means it fixed itself, so it's likely caching related somehow. The only way I can see that Ganesha can generate EINVAL is if a single directory has more than Dir_Max dirents in it. Default for Dir_Max is 65535. This can be verified by re-running while setting Dir_Max to something very large (1M or 10M). If the problem goes away, then this was it. However, I'm guessing that's not the case, since it shouldn't return an error in the root directory, which won't have a large number of dirents, since bonnie/dbench run in subdirs. In that case, the error has to come from Gluster somewhere. This is not an issue on 2.6/next, because dir chunking does not have a Dir_Max parameter.
Created attachment 1472265 [details] new log client side log(gfapi)
Jiffin/Susant, Can you check whether patch [1] fixes the issue? [1] https://review.gluster.org/20640
Thanks, Raghavendra for the update. I will just do a code read through the discover path for any other missing unref. Jiffin, if we can cherry pick the above patch and test it, our job would be easier. Thanks, Susant
(In reply to Susant Kumar Palai from comment #26) > Thanks, Raghavendra for the update. I will just do a code read through the > discover path for any other missing unref. A relevant note is discover codepath also does named lookups as part dht_heal_full_path.
Jiffin, putting a needinfo here to get the test result with the above patch. Thanks, Susant
Yeah I will give a try, from the logs which I have analyzed the AFAIR I didn't came across function mentioned in patch.
Thanks, Jiffin for trying out the patch at the earliest. While doing a dry run, I could see an extra unref being done in heal_full_path code path. Will submit a patch soon to test it out and for review. Thanks, Susant
REVISION POSTED: https://review.gluster.org/20643 (cluster/dht: Extra unref on inode in discover path) posted (#3) for review on master by Susant Palai
Jiffin, Can you try the patch at [1] and see if the issue is still reproduced. Thanks, Susant [1] https://review.gluster.org/#/c/20643/
(In reply to Susant Kumar Palai from comment #33) > Jiffin, > Can you try the patch at [1] and see if the issue is still reproduced. > > Thanks, > Susant > > [1] https://review.gluster.org/#/c/20643/ This patch has a problem, need to rework on this.
I am able to reproduce the issue with # rpm -qa | grep ganesha nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64 nfs-ganesha-2.5.5-10.el7rhgs.x86_64 Test case is same as reported in this BZ. ------ ls: reading directory .: Invalid argument total 0 .: ls: reading directory .: Invalid argument total 0 .: ls: reading directory .: Invalid argument total 0 .: ls: reading directory .: Invalid argument total 0 .: ls: reading directory .: Invalid argument total 0 .: ls: reading directory .: Invalid argument total 0 ^C [root@dhcp46-20 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp46-20 ganesha]# ls ls: reading directory .: Invalid argument ----- Setup is in same state.Please have a look. Setup details- dhcp46-199.lab.eng.blr.redhat.com dhcp47-173.lab.eng.blr.redhat.com dhcp46-193.lab.eng.blr.redhat.com dhcp46-197.lab.eng.blr.redhat.com Clients- dhcp46-20.lab.eng.blr.redhat.com --> Issue is seen on this client dhcp46-125.lab.eng.blr.redhat.com dhcp47-33.lab.eng.blr.redhat.com dhcp47-170.lab.eng.blr.redhat.com
Hi, I tried reproducing the issue with rpm's provided in step 45 by jiffin,Following is my observation so far - 1.I was unable to hit the issue under following cases- a) Running test without deleting content from mount point (Fresh mount) b) deleting contents from mount point,restart ganesha service and then again triggering test. I tried this couple of times y'day. 2. I was able to hit the issue when I trigger test,deleted content from mount point,Again restarted test without ganesha services restart. With this usecase as well,it doesn't seems to be every time reproducible.Looks like a race condition to me (Since when I reopened this BZ,I could hit it in 1st attempt,But with jiffin's rpm it took me 2-3 iteration to repro this issue) I have repro'ed the issue with case 2 on the setup- Setup details- dhcp46-199.lab.eng.blr.redhat.com dhcp47-173.lab.eng.blr.redhat.com dhcp46-193.lab.eng.blr.redhat.com dhcp46-197.lab.eng.blr.redhat.com Clients- dhcp46-20.lab.eng.blr.redhat.com dhcp46-125.lab.eng.blr.redhat.com dhcp47-33.lab.eng.blr.redhat.com --> Issue is seen on this client dhcp47-170.lab.eng.blr.redhat.com (Please take screen session of the clients.Lookups is still running) ======== ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# [root@dhcp47-33 ganesha]# [root@dhcp47-33 ganesha]# [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument [root@dhcp47-33 ganesha]# ls ls: reading directory .: Invalid argument =========
Jiffin, Let me know once you are done looking into setup and DEBUG logs. I can then probably try the workaround mention by you in comment #45.
(In reply to Manisha Saini from comment #48) > Jiffin, > > Let me know once you are done looking into setup and DEBUG logs. > I can then probably try the workaround mention by you in comment #45. Post discussing with Jiffin,tried the workaround of Ganesha service restart mentioned in comment #45,I was able to list content on mount point post performing service restart on that server node.
Created attachment 1478875 [details] log snippet
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days