Bug 1134305 - rpc actor failed to complete successfully messages in Glusterd
Summary: rpc actor failed to complete successfully messages in Glusterd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: x86_64
OS: Mac OS
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1133874 1148196
TreeView+ depends on / blocked
 
Reported: 2014-08-27 09:23 UTC by Harshavardhana
Modified: 2018-08-29 03:36 UTC (History)
3 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Clone Of:
Environment:
Last Closed: 2018-08-29 03:36:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Harshavardhana 2014-08-27 09:23:41 UTC
Description of problem:

[2014-08-27 09:10:16.931419] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0
[2014-08-27 09:10:16.931471] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0
[2014-08-27 09:10:16.931554] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0
[2014-08-27 09:10:16.931601] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0
[2014-08-27 09:10:16.931648] T [rpcsvc.c:600:rpcsvc_handle_rpc_call] 0-rpc-service: Client port: 1023 (unprivileged 0)
[2014-08-27 09:10:16.931696] T [rpcsvc-auth.c:393:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS-v2
[2014-08-27 09:10:16.931767] T [rpcsvc.c:474:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x11, Ver: 2, Program: 1298437, Pr
ogVers: 330, Proc: 14) from rpc-transport (socket.management)
[2014-08-27 09:10:16.931819] T [auth-glusterfs.c:242:auth_glusterfs_v2_authenticate] 0-rpc-service: Auth Info: pid: 99712, uid: 0, gid: 0, owner
: 0000000000000000
[2014-08-27 09:10:16.931865] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 09:10:16.931908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully with error(-1)   [2014-08-27 09:10:16.932035] D [rpcsvc.c:1283:rpcsvc_error_reply] (-->2   libgfrpc.0.dylib                    0x000000010dc71269 rpcsvc_notify + 489 (-->1   libgfrpc.0.dylib                    0x000000010dc70c6a rpcsvc_handle_rpc_call + 1530 (-->0   libgfrpc.0.dylib                    0x000000010dc70601 rpcsvc_check_and_reply_error + 145))) 0-: sending a RPC error reply                                                            [2014-08-27 09:10:16.932084] T [rpcsvc.c:1209:rpcsvc_submit_generic] 0-rpc-service: Tx message: 0                                               [2014-08-27 09:10:16.932130] T [rpcsvc.c:841:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 24, payload: 0, rpc hdr: 24               [2014-08-27 09:10:16.932665] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 1                                        [2014-08-27 09:10:16.933183] T [rpcsvc.c:1257:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0xb, Program: -, ProgV
ers: 0, Proc: 14) to rpc-transport (socket.management)                                                                                          [2014-08-27 09:10:16.933700] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 16, R/W: 0                                        [2014-08-27 09:10:16.933759] D [socket.c:502:__socket_rwv] 0-socket.management: EOF on socket                                                   [2014-08-27 09:10:16.933804] D [socket.c:2265:socket_event_handler] 0-transport: disconnecting now                                              [2014-08-27 09:10:16.933875] I [socket.c:864:__socket_reset] 0-management: closing socket: 16                                                   [2014-08-27 09:10:16.933935] T [socket.c:3843:fini] 0-socket.management: transport 0x7f96f9014a20 destroyed
[2014-08-27 09:10:16.934101] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0                                        [2014-08-27 09:10:16.934160] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0
[2014-08-27 09:10:16.934211] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0

While running regression tests i observed failures of this nature, where GLUSTER_FOP_PROGRAM is initiated over 'glusterd' port which - shouldn't happen and glusterd is failing as 'rpc actor' cannot be submitted. 

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Always - with all the test cases. 

Steps to Reproduce:
1. Run the following test

~~~
#!/bin/bash

. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc

cleanup;

TEST glusterd -L TRACE;
TEST pidof glusterd;

TEST $CLI volume info;

TEST $CLI volume create $V0 $H0:$B0/${V0}1;
TEST $CLI volume set $V0 nfs.disable on;

EXPECT "$V0" volinfo_field $V0 'Volume Name';
EXPECT 'Created' volinfo_field $V0 'Status';
TEST $CLI volume start $V0;
EXPECT 'Started' volinfo_field $V0 'Status';

TEST $GFS -s $H0 --volfile-id $V0 $M0;

# verify json validity

TEST ls $M0/.meta/version;

TEST $CLI volume stop $V0;
EXPECT 'Stopped' volinfo_field $V0 'Status';

TEST $CLI volume delete $V0;
TEST ! $CLI volume info $V0;

cleanup;
~~~

Comment 1 Kaushal 2014-08-27 10:25:19 UTC
This isn't a problem specific to Mac OS X. Shwetha hit the same problem with RHS.

The client xlator connects to GlusterD to get the brick port, then disconnects from GlusterD and reconnects to the brick. If FOPs happen during the period the client is connected to glusterd, these errors arise.

Pranith had worked on a patch that should solve this problem, https://review.gluster.org/5956 . He said he's willing to continue with the patch and take it to completion. I'm assigning the bug to him.

Comment 2 Harshavardhana 2014-08-27 16:42:03 UTC
Looks after cherry-picking from Pranith's change and making some local modifications fixing conflicts. I am still able to see such an issue, perhaps Pranith needs to re-work it with current master 'client' protocol. 

[2014-08-27 16:39:12.652783] T [rpcsvc.c:474:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x13, Ver: 2, Program: 1298437, ProgVers: 330, Proc: 14) from rpc-transport (socket.management)
[2014-08-27 16:39:12.652835] T [auth-glusterfs.c:242:auth_glusterfs_v2_authenticate] 0-rpc-service: Auth Info: pid: 10299, uid: 0, gid: 0, owner: 0000000000000000 
[2014-08-27 16:39:12.652881] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 16:39:12.652924] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2014-08-27 16:39:12.653033] D [rpcsvc.c:1282:rpcsvc_error_reply] (-->2   libgfrpc.0.dylib                    0x000000010fc0c199 rpcsvc_notify + 489 (-->1   libgfrpc.0.dylib                    0x000000010fc0bb9b rpcsvc_handle_rpc_call + 1515 (-->0   libgfrpc.0.dylib                    0x000000010fc0b54b rpcsvc_check_and_reply_error + 139))) 0-: sending a RPC error reply    
[2014-08-27 16:39:12.653082] T [rpcsvc.c:1208:rpcsvc_submit_generic] 0-rpc-service: Tx message: 0
[2014-08-27 16:39:12.653129] T [rpcsvc.c:840:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 24, payload: 0, rpc hdr: 24
[2014-08-27 16:39:12.653179] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 1
[2014-08-27 16:39:12.653258] T [rpcsvc.c:1256:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0xd, Program: -, ProgVers: 0, Proc: 14) to rpc-transport (socket.management)

Comment 3 Pranith Kumar K 2014-08-27 17:11:21 UTC
Harsha,
     Thanks a ton for reviving the patch. I applied it and ran the test. I don't see the same problem on fedora 20:

22:38:03 :) ⚡ grep "rpcsvc_request_create" usr-local-etc-glusterfs-glusterd.vol.log | awk '{print $13, $15}' | sort | uniq -c
     11 1238463, 2,
      2 14398633, 2,
      2 34123456, 1,

root@pranithk-laptop - /var/log/glusterfs 
22:40:34 :( ⚡ grep rpcsvc_program_actor usr-local-etc-glusterfs-glusterd.vol.log
[2014-08-27 17:04:43.159948] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:04:43.221376] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - CLI_CREATE_VOLUME
[2014-08-27 17:04:43.826943] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - SET_VOLUME
[2014-08-27 17:04:44.100342] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:04:44.168512] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:04:44.230645] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - START_VOLUME
[2014-08-27 17:04:45.042350] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:04:45.108081] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNIN
[2014-08-27 17:04:45.294157] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:04:45.369645] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:04:45.474878] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - STOP_VOLUME
[2014-08-27 17:04:45.659540] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNOUT
[2014-08-27 17:04:45.738302] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:04:45.800908] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - DELETE_VOLUME
[2014-08-27 17:04:45.862808] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME

Pranith

Comment 4 Harshavardhana 2014-08-27 17:20:00 UTC
This is on OSX

# grep rpcsvc_request_create /var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log | awk '{print $13, $15}' | sort | uniq -c    
   4 123451501, 1,
  11 1238463, 2,
   7 1298437, 330,
   3 14398633, 2,
   6 34123456, 1,

# grep rpcsvc_program_actor /var//log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log 
[2014-08-27 17:18:28.862326] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:18:28.879858] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - CLI_CREATE_VOLUME
[2014-08-27 17:18:28.914003] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - SET_VOLUME
[2014-08-27 17:18:28.962352] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:18:28.988864] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:18:29.005641] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - START_VOLUME
[2014-08-27 17:18:29.771888] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:18:29.787566] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNIN
[2014-08-27 17:18:29.835748] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:18:29.838748] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:18:29.886859] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:18:29.910531] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP
[2014-08-27 17:18:29.911738] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK
[2014-08-27 17:18:29.938980] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - STOP_VOLUME
[2014-08-27 17:18:29.956032] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNOUT
[2014-08-27 17:18:29.973484] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP
[2014-08-27 17:18:29.976985] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 17:18:29.977755] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK
[2014-08-27 17:18:29.978329] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:18:29.986777] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 17:18:30.000061] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP
[2014-08-27 17:18:30.001083] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 17:18:30.002488] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 17:18:30.003200] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - DELETE_VOLUME
[2014-08-27 17:18:30.005041] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK
[2014-08-27 17:18:30.010259] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 17:18:30.029876] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP
[2014-08-27 17:18:30.030527] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2014-08-27 17:18:30.031216] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:18:30.032938] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK
[2014-08-27 17:18:30.034857] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)

Comment 5 Harshavardhana 2014-08-27 17:47:14 UTC
After one more edit in patch indicated by Pranithk, it would seem like the issue is not present anymore

# grep rpcsvc_request_create /var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log | awk '{print $13, $15}' | sort | uniq -c    
   1 123451501, 1,
  11 1238463, 2,
   3 14398633, 2,
   3 34123456, 1,

# grep rpcsvc_program_actor /var//log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log 
[2014-08-27 17:43:40.393433] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:43:40.437105] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - CLI_CREATE_VOLUME
[2014-08-27 17:43:40.634000] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - SET_VOLUME
[2014-08-27 17:43:40.707208] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:43:40.730278] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:43:40.746755] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - START_VOLUME
[2014-08-27 17:43:41.412055] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:43:41.422965] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNIN
[2014-08-27 17:44:33.285142] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:44:33.287221] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:44:33.330312] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC
[2014-08-27 17:44:33.353975] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP
[2014-08-27 17:44:33.355397] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK
[2014-08-27 17:44:33.379824] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - STOP_VOLUME
[2014-08-27 17:44:33.382884] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNOUT
[2014-08-27 17:44:33.419443] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
[2014-08-27 17:44:33.436626] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - DELETE_VOLUME
[2014-08-27 17:44:33.454822] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME

Comment 6 Anand Avati 2014-08-27 20:15:22 UTC
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#6) for review on master by Harshavardhana (harsha)

Comment 7 Anand Avati 2014-08-28 10:04:24 UTC
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#7) for review on master by Harshavardhana (harsha)

Comment 8 Anand Avati 2014-08-28 21:05:49 UTC
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#8) for review on master by Harshavardhana (harsha)

Comment 9 Anand Avati 2014-08-28 22:36:59 UTC
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#9) for review on master by Harshavardhana (harsha)

Comment 10 Anand Avati 2014-09-03 18:17:47 UTC
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#10) for review on master by Harshavardhana (harsha)

Comment 11 Anand Avati 2014-10-13 22:25:01 UTC
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#11) for review on master by Harshavardhana (harsha)

Comment 13 Mike McCune 2016-03-28 23:18:47 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 14 Amar Tumballi 2018-08-29 03:36:19 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.