1323424 – Ganesha: Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in ganesha-gfapi logs.

Bug 1323424 - Ganesha: Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in ganesha-gfapi logs.

Summary: Ganesha: Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" mess...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Soumya Koduri
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:
Depends On:	1331969 1332433 1347590
Blocks:	1311817
TreeView+	depends on / blocked

Reported:	2016-04-02 20:37 UTC by Shashank Raj
Modified:	2016-11-08 03:53 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.9-4
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1331969 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:15:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ganesha-gfapi.log (2.03 MB, text/plain) 2016-04-15 22:30 UTC, Shashank Raj	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1323423	0	unspecified	CLOSED	nfs-ganesha+Tiering: Ganesha mount hangs during read_large fs-sanity tool on tiered volume.	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Internal Links: 1323423

Description Shashank Raj 2016-04-02 20:37:27 UTC

Description of problem:
Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in gfapi logs.

Version-Release number of selected component (if applicable):
3.7.9-1

How reproducible:
Once

Steps to Reproduce:
1.Setup ganesha cluster.
2.Create a tiered volume and mount it using version 4
3.start fs-sanity tool on the mount point.
4.Observed that fssanity run hanged on read_large test suite and continuous below messages are observed in gfapi logs


[2016-04-02 06:39:52.047504] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:53.051522] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:54.055535] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:55.060269] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:56.121016] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]''

5. These messages are getting generated every second and because of this the ganesha-afapi.log file size keeps increasing

[root@dhcp37-180 log]# ls -lh | grep ganesha
-rw-------. 1 root      root     1.2G Apr  2 15:26 ganesha-gfapi.log

Actual results:

Should not generate continuous logs in ganesha-gfapi.log
Expected results:


Additional info:

Comment 2 Soumya Koduri 2016-04-03 17:46:29 UTC

Shashank,

Were you running other operations as well along with fs-sanity, like attach/detach-tier operations?

Comment 3 Shashank Raj 2016-04-04 06:13:47 UTC

No, attach and detach tier operations were happening at that time.

Comment 4 Jiffin 2016-04-06 10:55:33 UTC

The read_large performs cat <large file 1Gb size>  > <destination file> on single file. I hit hang twice(out of 10-14), but I can't find any above mentioned log messages in ganesha-gfapi.log. I still didn't rcaed the hang issue.

Comment 5 Jiffin 2016-04-08 17:55:17 UTC

Adding to previous comment.

I didn't test it with HA. I suspect the log message does related do with test(either it was related to previous tests or the rebalance for the tier might not completed.) AFAIK the chance of having above polling messages if there is more than one glusterfs client(nfs-ganesha, tier daemon etc) access the same file. But in the above test(read large) I can't relate to any.

Also I found the following in the brick logs during hang:

[2016-04-05 05:44:17.424354] E [inodelk.c:304:__inode_unlock_lock] 0-new-locks:  Matching lock not found for unlock 0-9223372036854775807, by cc090070
d67f0000 on 0x7fdb4009b0f0
[2016-04-05 05:44:17.424435] E [MSGID: 115053] [server-rpc-fops.c:271:server_inodelk_cbk] 0-new-server: 76: INODELK /dir/src (10e2db56-4393-4a3e-8501-4b64f313dbb1) ==> (Invalid argument) [Invalid argument]

Comment 6 Shashank Raj 2016-04-15 22:28:48 UTC

Observed the same issue with tiered volume where continuous "cache_invalidation: invalid argument" messages are seen.

The volume was mounted on 2 clients and from one client removal of files were going on and from other client ls was done on the mount point.

Attaching ganesha-gfapi.log

Comment 7 Shashank Raj 2016-04-15 22:30:04 UTC

Created attachment 1147797 [details]
ganesha-gfapi.log

Comment 8 Soumya Koduri 2016-04-17 12:31:07 UTC

I see lots of Stale File Handle error for even dht/afr related lookups at the same time along with cache_invalidation requests. Moving it to tiering component to check if it is expected for tiered volume.

Comment 9 Jiffin 2016-04-18 13:45:28 UTC

These messages will be filled , when look is performed on unlinked. I cross check gfids which is mentioned in the log, I can't find those in backend. I didn't rcaed why polling happens on the unlink entries.

Comment 10 Soumya Koduri 2016-04-30 16:40:05 UTC

For the files being unlinked, since server shall forget the inode, it sends upcall to all the client connected to refresh their cache.

Thanks Jiffin for the RCA. He found that during upcall processing, while creating handle, we do not find entry in inode table and hence lookup is done resulting in these messages. The reason we cannot find inode entry is when the same client had done unlink. We should ignore such error cases. Have sent below patch to fix it upstream. 

http://review.gluster.org/14132

Comment 14 Shashank Raj 2016-05-20 06:21:29 UTC

Verified this bug with latest glusterfs-3.7.9-5 and nfs-ganesha-2.3.1-7 build and during removal of files from the clients, the cache_invalidation messages are not seen in ganesha-gfapi.log.

Tried simultaneous lookup and removal of files also and no such messages are seen. So based on the above observation, marking this bug as Verified.

Comment 16 errata-xmlrpc 2016-06-23 05:15:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.