Description of problem: [2014-08-27 09:10:16.931419] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 [2014-08-27 09:10:16.931471] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 [2014-08-27 09:10:16.931554] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 [2014-08-27 09:10:16.931601] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 [2014-08-27 09:10:16.931648] T [rpcsvc.c:600:rpcsvc_handle_rpc_call] 0-rpc-service: Client port: 1023 (unprivileged 0) [2014-08-27 09:10:16.931696] T [rpcsvc-auth.c:393:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS-v2 [2014-08-27 09:10:16.931767] T [rpcsvc.c:474:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x11, Ver: 2, Program: 1298437, Pr ogVers: 330, Proc: 14) from rpc-transport (socket.management) [2014-08-27 09:10:16.931819] T [auth-glusterfs.c:242:auth_glusterfs_v2_authenticate] 0-rpc-service: Auth Info: pid: 99712, uid: 0, gid: 0, owner : 0000000000000000 [2014-08-27 09:10:16.931865] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 09:10:16.931908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully with error(-1) [2014-08-27 09:10:16.932035] D [rpcsvc.c:1283:rpcsvc_error_reply] (-->2 libgfrpc.0.dylib 0x000000010dc71269 rpcsvc_notify + 489 (-->1 libgfrpc.0.dylib 0x000000010dc70c6a rpcsvc_handle_rpc_call + 1530 (-->0 libgfrpc.0.dylib 0x000000010dc70601 rpcsvc_check_and_reply_error + 145))) 0-: sending a RPC error reply [2014-08-27 09:10:16.932084] T [rpcsvc.c:1209:rpcsvc_submit_generic] 0-rpc-service: Tx message: 0 [2014-08-27 09:10:16.932130] T [rpcsvc.c:841:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 24, payload: 0, rpc hdr: 24 [2014-08-27 09:10:16.932665] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 1 [2014-08-27 09:10:16.933183] T [rpcsvc.c:1257:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0xb, Program: -, ProgV ers: 0, Proc: 14) to rpc-transport (socket.management) [2014-08-27 09:10:16.933700] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 16, R/W: 0 [2014-08-27 09:10:16.933759] D [socket.c:502:__socket_rwv] 0-socket.management: EOF on socket [2014-08-27 09:10:16.933804] D [socket.c:2265:socket_event_handler] 0-transport: disconnecting now [2014-08-27 09:10:16.933875] I [socket.c:864:__socket_reset] 0-management: closing socket: 16 [2014-08-27 09:10:16.933935] T [socket.c:3843:fini] 0-socket.management: transport 0x7f96f9014a20 destroyed [2014-08-27 09:10:16.934101] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 [2014-08-27 09:10:16.934160] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 [2014-08-27 09:10:16.934211] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 0 While running regression tests i observed failures of this nature, where GLUSTER_FOP_PROGRAM is initiated over 'glusterd' port which - shouldn't happen and glusterd is failing as 'rpc actor' cannot be submitted. Version-Release number of selected component (if applicable): mainline How reproducible: Always - with all the test cases. Steps to Reproduce: 1. Run the following test ~~~ #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../volume.rc cleanup; TEST glusterd -L TRACE; TEST pidof glusterd; TEST $CLI volume info; TEST $CLI volume create $V0 $H0:$B0/${V0}1; TEST $CLI volume set $V0 nfs.disable on; EXPECT "$V0" volinfo_field $V0 'Volume Name'; EXPECT 'Created' volinfo_field $V0 'Status'; TEST $CLI volume start $V0; EXPECT 'Started' volinfo_field $V0 'Status'; TEST $GFS -s $H0 --volfile-id $V0 $M0; # verify json validity TEST ls $M0/.meta/version; TEST $CLI volume stop $V0; EXPECT 'Stopped' volinfo_field $V0 'Status'; TEST $CLI volume delete $V0; TEST ! $CLI volume info $V0; cleanup; ~~~
This isn't a problem specific to Mac OS X. Shwetha hit the same problem with RHS. The client xlator connects to GlusterD to get the brick port, then disconnects from GlusterD and reconnects to the brick. If FOPs happen during the period the client is connected to glusterd, these errors arise. Pranith had worked on a patch that should solve this problem, https://review.gluster.org/5956 . He said he's willing to continue with the patch and take it to completion. I'm assigning the bug to him.
Looks after cherry-picking from Pranith's change and making some local modifications fixing conflicts. I am still able to see such an issue, perhaps Pranith needs to re-work it with current master 'client' protocol. [2014-08-27 16:39:12.652783] T [rpcsvc.c:474:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x13, Ver: 2, Program: 1298437, ProgVers: 330, Proc: 14) from rpc-transport (socket.management) [2014-08-27 16:39:12.652835] T [auth-glusterfs.c:242:auth_glusterfs_v2_authenticate] 0-rpc-service: Auth Info: pid: 10299, uid: 0, gid: 0, owner: 0000000000000000 [2014-08-27 16:39:12.652881] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 16:39:12.652924] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2014-08-27 16:39:12.653033] D [rpcsvc.c:1282:rpcsvc_error_reply] (-->2 libgfrpc.0.dylib 0x000000010fc0c199 rpcsvc_notify + 489 (-->1 libgfrpc.0.dylib 0x000000010fc0bb9b rpcsvc_handle_rpc_call + 1515 (-->0 libgfrpc.0.dylib 0x000000010fc0b54b rpcsvc_check_and_reply_error + 139))) 0-: sending a RPC error reply [2014-08-27 16:39:12.653082] T [rpcsvc.c:1208:rpcsvc_submit_generic] 0-rpc-service: Tx message: 0 [2014-08-27 16:39:12.653129] T [rpcsvc.c:840:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 24, payload: 0, rpc hdr: 24 [2014-08-27 16:39:12.653179] I [socket.c:467:__socket_rwv] 0-socket.management: Socker number 13, R/W: 1 [2014-08-27 16:39:12.653258] T [rpcsvc.c:1256:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0xd, Program: -, ProgVers: 0, Proc: 14) to rpc-transport (socket.management)
Harsha, Thanks a ton for reviving the patch. I applied it and ran the test. I don't see the same problem on fedora 20: 22:38:03 :) ⚡ grep "rpcsvc_request_create" usr-local-etc-glusterfs-glusterd.vol.log | awk '{print $13, $15}' | sort | uniq -c 11 1238463, 2, 2 14398633, 2, 2 34123456, 1, root@pranithk-laptop - /var/log/glusterfs 22:40:34 :( ⚡ grep rpcsvc_program_actor usr-local-etc-glusterfs-glusterd.vol.log [2014-08-27 17:04:43.159948] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:04:43.221376] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - CLI_CREATE_VOLUME [2014-08-27 17:04:43.826943] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - SET_VOLUME [2014-08-27 17:04:44.100342] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:04:44.168512] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:04:44.230645] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - START_VOLUME [2014-08-27 17:04:45.042350] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:04:45.108081] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNIN [2014-08-27 17:04:45.294157] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:04:45.369645] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:04:45.474878] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - STOP_VOLUME [2014-08-27 17:04:45.659540] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNOUT [2014-08-27 17:04:45.738302] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:04:45.800908] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - DELETE_VOLUME [2014-08-27 17:04:45.862808] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME Pranith
This is on OSX # grep rpcsvc_request_create /var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log | awk '{print $13, $15}' | sort | uniq -c 4 123451501, 1, 11 1238463, 2, 7 1298437, 330, 3 14398633, 2, 6 34123456, 1, # grep rpcsvc_program_actor /var//log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log [2014-08-27 17:18:28.862326] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:18:28.879858] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - CLI_CREATE_VOLUME [2014-08-27 17:18:28.914003] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - SET_VOLUME [2014-08-27 17:18:28.962352] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:18:28.988864] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:18:29.005641] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - START_VOLUME [2014-08-27 17:18:29.771888] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:18:29.787566] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNIN [2014-08-27 17:18:29.835748] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:18:29.838748] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:18:29.886859] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:18:29.910531] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP [2014-08-27 17:18:29.911738] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK [2014-08-27 17:18:29.938980] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - STOP_VOLUME [2014-08-27 17:18:29.956032] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNOUT [2014-08-27 17:18:29.973484] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP [2014-08-27 17:18:29.976985] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 17:18:29.977755] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK [2014-08-27 17:18:29.978329] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:18:29.986777] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 17:18:30.000061] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP [2014-08-27 17:18:30.001083] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 17:18:30.002488] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 17:18:30.003200] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - DELETE_VOLUME [2014-08-27 17:18:30.005041] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK [2014-08-27 17:18:30.010259] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 17:18:30.029876] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP [2014-08-27 17:18:30.030527] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2014-08-27 17:18:30.031216] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:18:30.032938] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK [2014-08-27 17:18:30.034857] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
After one more edit in patch indicated by Pranithk, it would seem like the issue is not present anymore # grep rpcsvc_request_create /var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log | awk '{print $13, $15}' | sort | uniq -c 1 123451501, 1, 11 1238463, 2, 3 14398633, 2, 3 34123456, 1, # grep rpcsvc_program_actor /var//log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log [2014-08-27 17:43:40.393433] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:43:40.437105] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - CLI_CREATE_VOLUME [2014-08-27 17:43:40.634000] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - SET_VOLUME [2014-08-27 17:43:40.707208] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:43:40.730278] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:43:40.746755] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - START_VOLUME [2014-08-27 17:43:41.412055] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:43:41.422965] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNIN [2014-08-27 17:44:33.285142] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:44:33.287221] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:44:33.330312] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Handshake - GETSPEC [2014-08-27 17:44:33.353975] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - DUMP [2014-08-27 17:44:33.355397] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - PORTBYBRICK [2014-08-27 17:44:33.379824] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - STOP_VOLUME [2014-08-27 17:44:33.382884] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: Gluster Portmap - SIGNOUT [2014-08-27 17:44:33.419443] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME [2014-08-27 17:44:33.436626] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - DELETE_VOLUME [2014-08-27 17:44:33.454822] T [rpcsvc.c:295:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterD svc cli - GET_VOLUME
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#6) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#7) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#8) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#9) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#10) for review on master by Harshavardhana (harsha)
REVIEW: http://review.gluster.org/5956 (protocol/client: Add explicit states for connection sequence) posted (#11) for review on master by Harshavardhana (harsha)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.