Bug 1397473
Summary: | Message logged many times : XDR decode of cache_invalidation failed. [Operation not permitted] | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Raghavendra Talur <rtalur> |
Component: | md-cache | Assignee: | Poornima G <pgurusid> |
Status: | CLOSED UPSTREAM | QA Contact: | Vivek Das <vdas> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | nchilaka, rhs-bugs, rkavunga, skoduri, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-19 05:24:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Raghavendra Talur
2016-11-22 15:44:50 UTC
A similar issue has been observed when debugging the bug 1398930. [2016-11-27 07:23:11.110028] W [xdr-rpc.c:55:xdr_to_rpc_call] 0-rpc: failed to decode call msg [2016-11-27 07:23:11.110197] W [rpc-clnt.c:717:rpc_clnt_handle_cbk] 0-testvol-client-2: RPC call decoding failed [2016-11-27 07:23:11.122054] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fbfa1ada642] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fbfa18a075e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fbfa18a086e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7fbfa18a1fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7fbfa18a28a0] ))))) 0-testvol-client-2: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2016-11-27 07:23:11.017856 (xid=0x183c17) [2016-11-27 07:23:12.253789] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-testvol-client-2: Server lk version = 1 [2016-11-27 07:23:18.625835] W [MSGID: 114063] [client-callback.c:110:client_cbk_cache_invalidation] 0-testvol-client-0: XDR decode of cache_invalidation failed. [Operation not permitted] [2016-11-27 07:24:01.786006] W [xdr-rpc.c:55:xdr_to_rpc_call] 0-rpc: failed to decode call msg [2016-11-27 07:24:01.786082] W [rpc-clnt.c:717:rpc_clnt_handle_cbk] 0-testvol-client-0: RPC call decoding failed [2016-11-27 07:24:01.786881] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fbfa1ada642] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fbfa18a075e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fbfa18a086e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7fbfa18a1fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7fbfa18a28a0] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(ENTRYLK(31)) called at 2016-11-27 07:24:01.785965 (xid=0x1fbe70) The message "W [MSGID: 114063] [client-callback.c:110:client_cbk_cache_invalidation] 0-testvol-client-0: XDR decode of cache_invalidation failed. [Operation not permitted]" repeated 6 times between [2016-11-27 07:23:18.625835] and [2016-11-27 07:23:55.956580] [2016-11-27 07:24:01.786910] E [MSGID: 114031] [client-rpc-fops.c:1654:client3_3_entrylk_cbk] 0-testvol-client-0: remote operation failed [Transport endpoint is not connected] [2016-11-27 07:24:01.786958] E [MSGID: 108007] [afr-lk-common.c:825:afr_unlock_entrylk_cbk] 0-testvol-replicate-0: /linux-4.8.9/Documentation/devicetree/bindings/c6x/clocks.txt: unlock failed on testvol-client-0 [Transport endpoint is not connected] [2016-11-27 07:24:01.787219] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fbfa1ada642] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fbfa18a075e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fbfa18a086e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7fbfa18a1fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7fbfa18a28a0] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2016-11-27 07:24:01.652084 (xid=0x1fbba5) Since we got this messages from server to clients, and it happened on i/o transport, it is most likely caused by upcall rpc messages. I am seeing these errors on my systemic setup of 3.2 in regression cycle too On my systemic setup, I am doing same path directory creation simultaneously from 3 different clients. Each client used different server IP to mount the volume using fuse protocol Also, Each client were dumping sosreports every 5min into the volume mount in a screen session, along with top output being appended to a file every minute The dir-creations were happening from different users Eg: client1(el 7.2) was running the dir-creation using pavan@rhs-client23 client2(el 6.7) as root@rhs-client24 client3(el 7.3) as cli21@rhs-client21 Note: these logs are wrt client1 ie rhs-client24 Also, however note that I am able to access the mount sosreports available at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.lab.eng.blr.redhat.com/ test execution details available at https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609 Version-Release number of selected component (if applicable): ============ 3.8.4-10 other BZs for reference(raised wrt issues on same setup) 1409472 - brick crashed on systemic setup 1397907 - seeing frequent kernel hangs when doing operations both on fuse client and gluster nodes on replica volumes (edit) [NEEDINFO] 1409568 - seeing socket disconnects and transport endpoint not connected frequently on systemic setup 1409572 - In fuse mount logs:seeing input/output error with split-brain observed logs and failing GETXATTR and STAT 1409580 - seeing stale file handle errors in fuse mount logs in systemic testing 1409583 - seeing RPC status error messages and timeouts due to RPC (rpc-clnt.c:200:call_bail) 1409135 - [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible 1409729 - On systemic setup seeing AFR errors w.r.t fileops like "afr-open.c:187:afr_openfd_fix_open_cbk" and "afr-lk-common.c:825:afr_unlock_entrylk_cbk" and "afr-lk-common.c:825:afr_unlock_entrylk_cbk" This has been fixed. Hence closing the bug. |