Description of problem: Continuous "0-glfs_h_poll_cache_invalidation: invalid argument" messages getting logged in gfapi logs. Version-Release number of selected component (if applicable): 3.7.9-1 How reproducible: Once Steps to Reproduce: 1.Setup ganesha cluster. 2.Create a tiered volume and mount it using version 4 3.start fs-sanity tool on the mount point. 4.Observed that fssanity run hanged on read_large test suite and continuous below messages are observed in gfapi logs [2016-04-02 06:39:52.047504] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument] [2016-04-02 06:39:53.051522] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument] [2016-04-02 06:39:54.055535] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument] [2016-04-02 06:39:55.060269] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument] [2016-04-02 06:39:56.121016] E [glfs-handleops.c:1828:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7f6d51d02ed3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7f6d518ea9b0] -->/lib64/libgfapi.so.0(+0x1971c) [0x7f6d518ea71c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]'' 5. These messages are getting generated every second and because of this the ganesha-afapi.log file size keeps increasing [root@dhcp37-180 log]# ls -lh | grep ganesha -rw-------. 1 root root 1.2G Apr 2 15:26 ganesha-gfapi.log Actual results: Should not generate continuous logs in ganesha-gfapi.log Expected results: Additional info:
Shashank, Were you running other operations as well along with fs-sanity, like attach/detach-tier operations?
No, attach and detach tier operations were happening at that time.
The read_large performs cat <large file 1Gb size> > <destination file> on single file. I hit hang twice(out of 10-14), but I can't find any above mentioned log messages in ganesha-gfapi.log. I still didn't rcaed the hang issue.
Adding to previous comment. I didn't test it with HA. I suspect the log message does related do with test(either it was related to previous tests or the rebalance for the tier might not completed.) AFAIK the chance of having above polling messages if there is more than one glusterfs client(nfs-ganesha, tier daemon etc) access the same file. But in the above test(read large) I can't relate to any. Also I found the following in the brick logs during hang: [2016-04-05 05:44:17.424354] E [inodelk.c:304:__inode_unlock_lock] 0-new-locks: Matching lock not found for unlock 0-9223372036854775807, by cc090070 d67f0000 on 0x7fdb4009b0f0 [2016-04-05 05:44:17.424435] E [MSGID: 115053] [server-rpc-fops.c:271:server_inodelk_cbk] 0-new-server: 76: INODELK /dir/src (10e2db56-4393-4a3e-8501-4b64f313dbb1) ==> (Invalid argument) [Invalid argument]
Observed the same issue with tiered volume where continuous "cache_invalidation: invalid argument" messages are seen. The volume was mounted on 2 clients and from one client removal of files were going on and from other client ls was done on the mount point. Attaching ganesha-gfapi.log
Created attachment 1147797 [details] ganesha-gfapi.log
I see lots of Stale File Handle error for even dht/afr related lookups at the same time along with cache_invalidation requests. Moving it to tiering component to check if it is expected for tiered volume.
These messages will be filled , when look is performed on unlinked. I cross check gfids which is mentioned in the log, I can't find those in backend. I didn't rcaed why polling happens on the unlink entries.
For the files being unlinked, since server shall forget the inode, it sends upcall to all the client connected to refresh their cache. Thanks Jiffin for the RCA. He found that during upcall processing, while creating handle, we do not find entry in inode table and hence lookup is done resulting in these messages. The reason we cannot find inode entry is when the same client had done unlink. We should ignore such error cases. Have sent below patch to fix it upstream. http://review.gluster.org/14132
Verified this bug with latest glusterfs-3.7.9-5 and nfs-ganesha-2.3.1-7 build and during removal of files from the clients, the cache_invalidation messages are not seen in ganesha-gfapi.log. Tried simultaneous lookup and removal of files also and no such messages are seen. So based on the above observation, marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240