Bug 1569657 - [Ganesha] While performing lookups from two of the clients, "ls" command got failed with "Invalid argument" [NEEDINFO]
Summary: [Ganesha] While performing lookups from two of the clients, "ls" command got ...
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Jiffin
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On: 1610256
Blocks: 1503143
TreeView+ depends on / blocked
 
Reported: 2018-04-19 17:28 UTC by Manisha Saini
Modified: 2020-09-10 21:39 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: Consider following scenario : First client performing lookup on the root in a loop Second client creating directory and files inside, removes that directory in a loop Consequence: If the lookup on the first client hits server before completing mkdir call from the second then first client tries to heal path. But healing resulted in getting previous stale inode for that directory instead of fetching new entry Workaround (if any): Need to re-export the volume in server where issue is hit. Using following steps : cd /usr/libexec/ganesha ./dbus-send.sh /var/run/gluster/shared_storage/nfs-ganesha/ off <volume name> check volume is unexported using "showmount -e" from the server ./dbus-send.sh /var/run/gluster/shared_storage/nfs-ganesha/ on <volume name> check volume is exported again using "showmount -e "from the server Result: Lookup on that directory results in EINVALID at the mount point
Clone Of:
: 1610256 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:
grajoria: needinfo? (smali)


Attachments (Terms of Use)
new log client side log(gfapi) (7.28 KB, text/plain)
2018-08-02 06:19 UTC, Jiffin
no flags Details
log snippet (12.61 KB, text/plain)
2018-08-27 07:07 UTC, Jiffin
no flags Details

Description Manisha Saini 2018-04-19 17:28:27 UTC
Description of problem:

Single volume mounted via 4 different VIP's on 4 clients (v3/v4).While running linux untar,dbench,iozone from 2 clients and parallel lookups from another 2 clients, lookups got failed on both the clients.

Client on which lookup failed did following sequence 
   Client 1:
  -> while true;do find . -mindepth 1 -type f;done
  ->  while true;do ls -lRt;done 

   Client 2:
  -> find command in loop

Doing "ls" on the same mount point-

[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# 


Able to create files and dirs on same mount-

[root@dhcp47-33 mani-mount]# touch mani
[root@dhcp47-33 mani-mount]# touch mani1
[root@dhcp47-33 mani-mount]# touch mani2
[root@dhcp47-33 mani-mount]# touch mani3
[root@dhcp47-33 mani-mount]# mkdir ms1
[root@dhcp47-33 mani-mount]# mkdir ms2
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument


Another client on which lookups failed-

[root@dhcp46-20 mani-mount]# ^C
[root@dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument



Unmounted and remounted the same volume on same client with same VIP.Issue still exist.

Mounted the same volume on another client with same VIP.Again "ls" unable to list content

Did "ls" from one of the client from which iozone was ongoing,able to get data-

mani-mount]# ls
dir1  f2           linux-4.9.5.tar.xz  mani1  mani3  ms2      test
f1    linux-4.9.5  mani                mani2  ms1    run6396  test1



Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-4.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64
nfs-ganesha-2.5.5-4.el7rhgs.x86_64


How reproducible:
1/1

Steps to Reproduce:
1.Create 4 node ganesha cluster
2.Create  2 x (2 + 1) arbiter volume
3.Export the volume via ganesha
4.Mount the volume on 4 clients with 4 different VIP's.
   2 clients with vers=3 and 2 clients with vers=4.0
5.Perform following data set-
  -> Client 1 (v3):Run dbench first.Post completion run iozone
  -> Client 2 (v4):lookups finds and ls -lRt in loop
  -> Client 3 (v3):lookups finds
  -> Client 4 (v4) :liux untars

Actual results:
Lookups got failed from both the clients performing lookups.No impact on ongoing IO's


Expected results:
lookups should not fail


Additional info:

Not able to find any error logs causing lookups to fail in ganesha-gfapi.logs.
On all the 4 server node,ganesha is up and running


# showmount -e
Export list for dhcp37-120.lab.eng.blr.redhat.com:
/Ganesha-lock (everyone)
/mani-test1   (everyone)

------------------------------

]# pcs status
Cluster name: ganesha-ha-360
Stack: corosync
Current DC: dhcp37-213.lab.eng.blr.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Thu Apr 19 22:55:06 2018
Last change: Thu Apr 19 21:39:52 2018 by root via crm_attribute on dhcp37-99.lab.eng.blr.redhat.com

4 nodes configured
24 resources configured

Online: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp37-120.lab.eng.blr.redhat.com dhcp37-213.lab.eng.blr.redhat.com dhcp37-90.lab.eng.blr.redhat.com dhcp37-99.lab.eng.blr.redhat.com ]
 Resource Group: dhcp37-99.lab.eng.blr.redhat.com-group
     dhcp37-99.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-99.lab.eng.blr.redhat.com
     dhcp37-99.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-99.lab.eng.blr.redhat.com
     dhcp37-99.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-99.lab.eng.blr.redhat.com
 Resource Group: dhcp37-120.lab.eng.blr.redhat.com-group
     dhcp37-120.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-120.lab.eng.blr.redhat.com
     dhcp37-120.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-120.lab.eng.blr.redhat.com
     dhcp37-120.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-120.lab.eng.blr.redhat.com
 Resource Group: dhcp37-90.lab.eng.blr.redhat.com-group
     dhcp37-90.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-90.lab.eng.blr.redhat.com
     dhcp37-90.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-90.lab.eng.blr.redhat.com
     dhcp37-90.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-90.lab.eng.blr.redhat.com
 Resource Group: dhcp37-213.lab.eng.blr.redhat.com-group
     dhcp37-213.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp37-213.lab.eng.blr.redhat.com
     dhcp37-213.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-213.lab.eng.blr.redhat.com
     dhcp37-213.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp37-213.lab.eng.blr.redhat.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

------------------------

Client on which lookup got failed-

# df
Filesystem              1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel-root    17811456 6610196  11201260  38% /
devtmpfs                  3943584       0   3943584   0% /dev
tmpfs                     3955628       0   3955628   0% /dev/shm
tmpfs                     3955628   33492   3922136   1% /run
tmpfs                     3955628       0   3955628   0% /sys/fs/cgroup
/dev/sda1                 1038336  219624    818712  22% /boot
tmpfs                      791128       0    791128   0% /run/user/0
10.70.37.19:/mani-test1 104273920 2319360 101954560   3% /mnt/mani-mount
[root@dhcp47-33 mani-mount]# cd /mnt/mani-mount/
[root@dhcp47-33 mani-mount]# touch f1
[root@dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument

Attaching sosreports and tcpdumps shortly

Comment 5 Susant Kumar Palai 2018-04-23 09:48:00 UTC
from gfapi log :

[2018-04-19 16:18:17.188419] W [MSGID: 108001] [afr-common.c:5171:afr_notify] 0-mani-test1-replicate-0: Client-quorum is not met
[2018-04-19 16:18:17.188877] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-3: disconnected from mani-test1-client-3. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-19 16:18:17.188955] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-4: disconnected from mani-test1-client-4. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-19 16:18:17.188976] W [MSGID: 108001] [afr-common.c:5171:afr_notify] 0-mani-test1-replicate-1: Client-quorum is not met
[2018-04-19 16:18:17.188805] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-2: disconnected from mani-test1-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-19 16:18:17.189312] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-mani-test1-client-5: disconnected from mani-test1-client-5. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-19 16:18:17.189342] E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-19 16:18:17.190301] E [MSGID: 108006] [afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
The message "I [MSGID: 104043] [glfs-mgmt.c:628:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile, continuing" repeated 2 times between [2018-04-19 16:17:52.270376] and [2018-04-19 16:18:15.862041]
[2018-04-19 16:35:23.081937] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 7cdfc3f6-4337-4a50-a58f-0305e65cb0c0). Holes=1 overlaps=0
[2018-04-19 16:38:39.796313] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = f4619ab7-cfc9-481c-91ad-a03fa096ecdc). Holes=1 overlaps=0
[2018-04-19 16:38:42.005686] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = bbb57d4e-b895-4437-82bb-04c9b45a991b). Holes=1 overlaps=0
[2018-04-19 16:39:22.441298] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = f6ac57d8-835d-4ad1-8eb2-2d970b14b312). Holes=1 overlaps=0
[2018-04-19 16:39:29.100457] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in (null) (gfid = 29ee038c-9e56-4ff3-965a-e619d3c0eec3). Holes=1 overla

Seems like the layout needed a heal and both the server went down. This will lead to lookup failure on root it self.

Having the setup would have helped confirming the layout issue and further any client-server connection issue.

In my opinion, either the bricks were killed or there was network partition and hence the problem.

Comment 6 Daniel Gryniewicz 2018-04-23 13:49:13 UTC
MDCACHE doesn't do a getattrs.  The attributes of the object referenced by the dirent are passed back to MDCACHE in the callback by the FSAL.  FSAL_GLUSTER uses glfs_xreaddirplus_r() to get both the file handle and it's attributes, which are then passed back to MDCACHE.  So no separate getattrs() should be called.

That said, MDCACHE needs the attributes when it creates the object, so we can't just skip the dirent.

Comment 14 Manisha Saini 2018-07-12 07:25:49 UTC
I am in process of testing this use case again.
Will update the bug with results by tomorrow morning.

Comment 16 Daniel Gryniewicz 2018-07-12 13:32:36 UTC
So, it appears to be working now:

[root@dhcp46-20 ganesha]# ls                            
linux-4.9.5  linux-4.9.5.tar.xz  run23992  run24219 
[root@dhcp46-20 ganesha]#   

This means it fixed itself, so it's likely caching related somehow.  The only way I can see that Ganesha can generate EINVAL is if a single directory has more than Dir_Max dirents in it.  Default for Dir_Max is 65535.  This can be verified by re-running while setting Dir_Max to something very large (1M or 10M).  If the problem goes away, then this was it.

However, I'm guessing that's not the case, since it shouldn't return an error in the root directory, which won't have a large number of dirents, since bonnie/dbench run in subdirs.  In that case, the error has to come from Gluster somewhere.

This is not an issue on 2.6/next, because dir chunking does not have a Dir_Max parameter.

Comment 23 Jiffin 2018-08-02 06:19:51 UTC
Created attachment 1472265 [details]
new log client side log(gfapi)

Comment 25 Raghavendra G 2018-08-06 05:19:31 UTC
Jiffin/Susant,

Can you check whether patch [1] fixes the issue?

[1] https://review.gluster.org/20640

Comment 26 Susant Kumar Palai 2018-08-06 05:42:16 UTC
Thanks, Raghavendra for the update. I will just do a code read through the discover path for any other missing unref.

Jiffin, if we can cherry pick the above patch and test it, our job would be easier.

Thanks,
Susant

Comment 27 Raghavendra G 2018-08-06 05:46:07 UTC
(In reply to Susant Kumar Palai from comment #26)
> Thanks, Raghavendra for the update. I will just do a code read through the
> discover path for any other missing unref.

A relevant note is discover codepath also does named lookups as part dht_heal_full_path.

Comment 28 Susant Kumar Palai 2018-08-06 05:58:02 UTC
Jiffin, putting a needinfo here to get the test result with the above patch.


Thanks,
Susant

Comment 29 Jiffin 2018-08-06 06:00:00 UTC
Yeah I will give a try, from the logs which I have analyzed the AFAIR I didn't came across function mentioned in patch.

Comment 31 Susant Kumar Palai 2018-08-06 07:27:01 UTC
Thanks, Jiffin for trying out the patch at the earliest. 

While doing a dry run, I could see an extra unref being done in heal_full_path code path. Will submit a patch soon to test it out and for review.

Thanks,
Susant

Comment 32 Worker Ant 2018-08-06 10:06:49 UTC
REVISION POSTED: https://review.gluster.org/20643 (cluster/dht: Extra unref on inode in discover path) posted (#3) for review on master by Susant Palai

Comment 33 Susant Kumar Palai 2018-08-06 10:07:05 UTC
Jiffin,
   Can you try the patch at [1] and see if the issue is still reproduced.

Thanks,
Susant

[1] https://review.gluster.org/#/c/20643/

Comment 34 Susant Kumar Palai 2018-08-06 11:32:28 UTC
(In reply to Susant Kumar Palai from comment #33)
> Jiffin,
>    Can you try the patch at [1] and see if the issue is still reproduced.
> 
> Thanks,
> Susant
> 
> [1] https://review.gluster.org/#/c/20643/

This patch has a problem, need to rework on this.

Comment 41 Manisha Saini 2018-08-24 08:03:17 UTC
I am able to reproduce the issue with

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64

Test case is same as reported in this BZ.


------
ls: reading directory .: Invalid argument
total 0
.:
ls: reading directory .: Invalid argument
total 0
.:
ls: reading directory .: Invalid argument
total 0
.:
ls: reading directory .: Invalid argument
total 0
.:
ls: reading directory .: Invalid argument
total 0
.:
ls: reading directory .: Invalid argument
total 0
^C
[root@dhcp46-20 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp46-20 ganesha]# ls
ls: reading directory .: Invalid argument
-----

Setup is in same state.Please have a look.

Setup details-

dhcp46-199.lab.eng.blr.redhat.com
dhcp47-173.lab.eng.blr.redhat.com
dhcp46-193.lab.eng.blr.redhat.com
dhcp46-197.lab.eng.blr.redhat.com

Clients-

dhcp46-20.lab.eng.blr.redhat.com --> Issue is seen on this client
dhcp46-125.lab.eng.blr.redhat.com
dhcp47-33.lab.eng.blr.redhat.com
dhcp47-170.lab.eng.blr.redhat.com

Comment 47 Manisha Saini 2018-08-26 06:03:54 UTC
Hi,


I tried reproducing the issue with rpm's provided in step 45 by jiffin,Following is my observation so far -

1.I was unable to hit the issue under following cases-

a) Running test without deleting content from mount point (Fresh mount)

b) deleting contents from mount point,restart ganesha service and then again triggering test. I tried this couple of times y'day.

2. I was able to hit the issue when I trigger test,deleted content from mount point,Again restarted test without ganesha services restart.
With this usecase as well,it doesn't seems to be every time reproducible.Looks like a race condition to me (Since when I reopened this BZ,I could hit it in 1st attempt,But with jiffin's rpm it took me 2-3 iteration to repro this issue)



I have repro'ed the issue with case 2 on the setup-

Setup details-

dhcp46-199.lab.eng.blr.redhat.com
dhcp47-173.lab.eng.blr.redhat.com
dhcp46-193.lab.eng.blr.redhat.com
dhcp46-197.lab.eng.blr.redhat.com

Clients-

dhcp46-20.lab.eng.blr.redhat.com 
dhcp46-125.lab.eng.blr.redhat.com
dhcp47-33.lab.eng.blr.redhat.com --> Issue is seen on this client
dhcp47-170.lab.eng.blr.redhat.com

(Please take screen session of the clients.Lookups is still running)

========
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# 
[root@dhcp47-33 ganesha]# 
[root@dhcp47-33 ganesha]# 
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
[root@dhcp47-33 ganesha]# ls
ls: reading directory .: Invalid argument
=========

Comment 48 Manisha Saini 2018-08-26 06:06:27 UTC
Jiffin,

Let me know once you are done looking into setup and DEBUG logs.
I can then probably try the workaround mention by you in comment #45.

Comment 49 Manisha Saini 2018-08-26 08:28:39 UTC
(In reply to Manisha Saini from comment #48)
> Jiffin,
> 
> Let me know once you are done looking into setup and DEBUG logs.
> I can then probably try the workaround mention by you in comment #45.

Post discussing with Jiffin,tried the workaround of Ganesha service restart mentioned in comment #45,I was able to list content on mount point post performing service restart on that server node.

Comment 52 Jiffin 2018-08-27 07:07:45 UTC
Created attachment 1478875 [details]
log snippet


Note You need to log in before you can comment on or make changes to this bug.