1422787 – [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible

Bug 1422787 - [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible

Summary: [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rpc
Sub Component:
Version:	3.9
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1421937 1422363
Blocks:	1409135 1422788
TreeView+	depends on / blocked

Reported:	2017-02-16 09:19 UTC by Poornima G
Modified:	2017-04-05 10:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:	1422363
Clones:	1422788 (view as bug list)
Environment:
Last Closed:	2017-03-08 12:34:35 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Poornima G 2017-02-16 09:19:02 UTC

+++ This bug was initially created as a clone of Bug #1422363 +++

+++ This bug was initially created as a clone of Bug #1421937 +++

+++ This bug was initially created as a clone of Bug #1409135 +++

Description of problem:
RPC failed to decode the msg on nfs mount leading to  Mounted volume on NFS mount and started sequential write with o-direct flag

Version-Release number of selected component (if applicable):
3.8.4-8
Logs are placed at 
rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

How reproducible:
Tried Once

Steps to Reproduce:
1. 4 Servers And 4 Clients , Mount 1:1 with gnfs
2. Daemonize FIO on 4 Client and start sequential write with 0 direct flag
3. After the inception of IO's the tool got hanged.
4. Multiple errors and warnings in nfs.log

<SNIP>
2016-12-29 10:27:29.424871] W [xdr-rpc.c:55:xdr_to_rpc_call] 0-rpc: failed to decode call msg
[2016-12-29 10:27:29.425032] W [rpc-clnt.c:717:rpc_clnt_handle_cbk] 0-testvol-client-0: RPC call decoding failed
[2016-12-29 10:27:29.443275] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f9ccdb40682] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f9ccd90675e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f9ccd90686e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f9ccd907fd4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f9ccd908864] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-12-29 10:26:55.289465 (xid=0xa8ddd)
[2016-12-29 10:27:29.443308] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-testvol-client-0: remote operation failed [Transport endpoint is not connected]
[2016-12-29 10:27:29.443571] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f9ccdb40682] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f9ccd90675e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f9ccd90686e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f9ccd907fd4] (-->

</SNIP>

Actual results:
IO tool got hung and multiple error and warning in the logs

Expected results:
NO IO hang should be observed

Additional info:

[root@gqas004 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: e2212c28-f04a-4f08-9f17-b0fb74434bbf
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ip1:/bricks/testvol_brick0
Brick2: ip2:/bricks/testvol_brick1
Brick3: ip3:/bricks/testvol_brick2
Brick4: ip4:/bricks/testvol_brick3
Options Reconfigured:
cluster.use-compound-fops: off
network.remote-dio: off
performance.strict-o-direct: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off

Comment 1 Worker Ant 2017-02-16 09:21:59 UTC

REVIEW: https://review.gluster.org/16637 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#1) for review on release-3.9 by Poornima G (pgurusid)

Comment 2 Worker Ant 2017-02-17 09:56:52 UTC

REVIEW: https://review.gluster.org/16637 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#2) for review on release-3.9 by Poornima G (pgurusid)

Comment 3 Kaushal 2017-03-08 12:34:35 UTC

This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1].

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release.

[1]: https://www.gluster.org/community/release-schedule/

Comment 4 Worker Ant 2017-04-05 10:41:29 UTC

REVIEW: https://review.gluster.org/16637 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#3) for review on release-3.9 by Poornima G (pgurusid)

Note You need to log in before you can comment on or make changes to this bug.