Bug 1763865

Summary: [GSS] rpc actor failed to complete successfully
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: slenzen
Component: protocolAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED DUPLICATE QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: nbalacha, rhs-bugs, rkavunga, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-04 09:00:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 13 Pranith Kumar K 2019-11-04 09:00:32 UTC
This bug is fixed in 3.4.x as part of https://bugzilla.redhat.com/show_bug.cgi?id=1545277.

RCA:
We see the following logs in glusterd:
=======================
[2019-10-17 09:35:13.561221] W [rpcsvc.c:265:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 10.55.210.131:1005
[2019-10-17 09:35:13.561247] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
=======================

When I looked for the fop-program 1298437, we find that it is fop-program.
=======================
pk@localhost - ~/workspace/rhs-glusterfs ((HEAD detached at v3.8.4-52.5))
14:08:06 :) ⚡ git grep 1298437
rpc/rpc-lib/src/protocol-common.h:#define GLUSTER_FOP_PROGRAM   1298437 /* Completely random */
=======================
RPC calls intended for the bricks are being sent to glusterd.

On the brick we see the following errors:
=======================
[2019-10-17 11:26:47.527079] E [server-helpers.c:388:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7f90a49058c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x2e7bf) [0x7f908fde57bf] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe094) [0x7f908fdc5094] ) 0-server: invalid argument: client [Invalid argument]
[2019-10-17 11:26:47.527107] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
=======================

These errors appear when brick finds that the rpc call is received from a client which is yet to complete set-volume on the brick.

Sequence of steps for a client to connect to the brick:
1) Connect to glusterd where the bricks reside
2) query glusterd for the port information on the machine where the brick is running.
3) disconnect from glusterd
4) Connect to the brick
5) do 'set-volume' indicating fops will start coming to the brick.

Fops should be sent over the wire only after step-5. In 3.3.x clients there was a bug where in, fops would be sent over the wire right after step-1). We would even see crashes if this happens at the time the brick is just coming up and is not initialized as in bz#1545277.

Since the bz is fixed in 3.4.x I am marking this a duplicate of the earlier bz.

*** This bug has been marked as a duplicate of bug 1545277 ***