Bug 1323424

Summary: Ganesha: Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in ganesha-gfapi logs.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Shashank Raj <sraj>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: asrivast, jthottan, kkeithle, ndevos, nlevinki, rhinduja, sashinde, skoduri, sraj
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1331969 (view as bug list) Environment:
Last Closed: 2016-06-23 05:15:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1331969, 1332433, 1347590    
Bug Blocks: 1311817    
Attachments:
Description Flags
ganesha-gfapi.log none

Description Shashank Raj 2016-04-02 20:37:27 UTC
Description of problem:
Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in gfapi logs.

Version-Release number of selected component (if applicable):
3.7.9-1

How reproducible:
Once

Steps to Reproduce:
1.Setup ganesha cluster.
2.Create a tiered volume and mount it using version 4
3.start fs-sanity tool on the mount point.
4.Observed that fssanity run hanged on read_large test suite and continuous below messages are observed in gfapi logs


[2016-04-02 06:39:52.047504] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:53.051522] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:54.055535] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:55.060269] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2016-04-02 06:39:56.121016] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]''

5. These messages are getting generated every second and because of this the ganesha-afapi.log file size keeps increasing

[root@dhcp37-180 log]# ls -lh | grep ganesha
-rw-------. 1 root      root     1.2G Apr  2 15:26 ganesha-gfapi.log

Actual results:

Should not generate continuous logs in ganesha-gfapi.log
Expected results:


Additional info:

Comment 2 Soumya Koduri 2016-04-03 17:46:29 UTC
Shashank,

Were you running other operations as well along with fs-sanity, like attach/detach-tier operations?

Comment 3 Shashank Raj 2016-04-04 06:13:47 UTC
No, attach and detach tier operations were happening at that time.

Comment 4 Jiffin 2016-04-06 10:55:33 UTC
The read_large performs cat <large file 1Gb size>  > <destination file> on single file. I hit hang twice(out of 10-14), but I can't find any above mentioned log messages in ganesha-gfapi.log. I still didn't rcaed the hang issue.

Comment 5 Jiffin 2016-04-08 17:55:17 UTC
Adding to previous comment.

I didn't test it with HA. I suspect the log message does related do with test(either it was related to previous tests or the rebalance for the tier might not completed.) AFAIK the chance of having above polling messages if there is more than one glusterfs client(nfs-ganesha, tier daemon etc) access the same file. But in the above test(read large) I can't relate to any.

Also I found the following in the brick logs during hang:

[2016-04-05 05:44:17.424354] E [inodelk.c:304:__inode_unlock_lock] 0-new-locks:  Matching lock not found for unlock 0-9223372036854775807, by cc090070
d67f0000 on 0x7fdb4009b0f0
[2016-04-05 05:44:17.424435] E [MSGID: 115053] [server-rpc-fops.c:271:server_inodelk_cbk] 0-new-server: 76: INODELK /dir/src (10e2db56-4393-4a3e-8501-4b64f313dbb1) ==> (Invalid argument) [Invalid argument]

Comment 6 Shashank Raj 2016-04-15 22:28:48 UTC
Observed the same issue with tiered volume where continuous "cache_invalidation: invalid argument" messages are seen.

The volume was mounted on 2 clients and from one client removal of files were going on and from other client ls was done on the mount point.

Attaching ganesha-gfapi.log

Comment 7 Shashank Raj 2016-04-15 22:30:04 UTC
Created attachment 1147797 [details]
ganesha-gfapi.log

Comment 8 Soumya Koduri 2016-04-17 12:31:07 UTC
I see lots of Stale File Handle error for even dht/afr related lookups at the same time along with cache_invalidation requests. Moving it to tiering component to check if it is expected for tiered volume.

Comment 9 Jiffin 2016-04-18 13:45:28 UTC
These messages will be filled , when look is performed on unlinked. I cross check gfids which is mentioned in the log, I can't find those in backend. I didn't rcaed why polling happens on the unlink entries.

Comment 10 Soumya Koduri 2016-04-30 16:40:05 UTC
For the files being unlinked, since server shall forget the inode, it sends upcall to all the client connected to refresh their cache.

Thanks Jiffin for the RCA. He found that during upcall processing, while creating handle, we do not find entry in inode table and hence lookup is done resulting in these messages. The reason we cannot find inode entry is when the same client had done unlink. We should ignore such error cases. Have sent below patch to fix it upstream. 

http://review.gluster.org/14132

Comment 14 Shashank Raj 2016-05-20 06:21:29 UTC
Verified this bug with latest glusterfs-3.7.9-5 and nfs-ganesha-2.3.1-7 build and during removal of files from the clients, the cache_invalidation messages are not seen in ganesha-gfapi.log.

Tried simultaneous lookup and removal of files also and no such messages are seen. So based on the above observation, marking this bug as Verified.

Comment 16 errata-xmlrpc 2016-06-23 05:15:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240