960285 – Client PORTBYBRICK request for replaced-brick will retry forever because brick port result is 0

Bug 960285 - Client PORTBYBRICK request for replaced-brick will retry forever because brick port result is 0

Summary: Client PORTBYBRICK request for replaced-brick will retry forever because bric...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rpc
Sub Component:
Version:	3.3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	krishnan parthasarathi
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-05-06 21:40 UTC by Joe Julian
Modified:	2015-11-03 23:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-11-04 15:34:42 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Joe Julian 2013-05-06 21:40:24 UTC

Description of problem:
I replaced bricks on one of my servers, ie. "replace-brick wpkg ewcs7:/var/spool/glusterfs/d_wpkg ewcs7:/data/glusterfs/wpkg/d/brick commit force"

One of my clients began a loop where it was logging "I [client.c:2090:client_rpc_notify] 1-wpkg-client-8: disconnected" several times a second, with a corrosponding "I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now" from glusterd on ewcs7.

I finally narrowed it down to the replace brick. The client was continuing to send PORTBYBRICK requests to ewcs7 for /var/spool/glusterfs/d_wpkg. The result was:
Retval: -1
Errno:   0
Port:    0

Version-Release number of selected component (if applicable):
3.3.1

How reproducible:
Uncertain. Failed on 3 mounted volumes on one client.

Steps to Reproduce:
1. Replace a brick
2. ???
  
Actual results:
Logs start filling with attempts to connect to the old brick.


Expected results:
If a PORTBYBRICK rpc request fails because the brick does not exist, an error should be raised and the graph should be reloaded.

Comment 1 Joe Julian 2013-05-06 21:49:28 UTC

Given what I thought I knew of the problem, I thought by making a client graph change I could make it reload the graph and cure it's problem. I disabled quick-read and the problem continued. I reset quick-read and the problem still continued. Here's the new graph:

Given volfile:
+------------------------------------------------------------------------------+
  1: volume wpkg-client-0
  2:     type protocol/client
  3:     option remote-host ewcs2
  4:     option remote-subvolume /var/spool/glusterfs/a_wpkg
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume wpkg-client-1
  9:     type protocol/client
 10:     option remote-host ewcs10
 11:     option remote-subvolume /data/glusterfs/wpkg/a
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume wpkg-client-2
 16:     type protocol/client
 17:     option remote-host ewcs7
 18:     option remote-subvolume /data/glusterfs/wpkg/a/brick
 19:     option transport-type tcp
 20: end-volume
 21: 
 22: volume wpkg-client-3
 23:     type protocol/client
 24:     option remote-host ewcs2
 25:     option remote-subvolume /var/spool/glusterfs/b_wpkg
 26:     option transport-type tcp
 27: end-volume
 28: 
 29: volume wpkg-client-4
 30:     type protocol/client
 31:     option remote-host ewcs10
 32:     option remote-subvolume /data/glusterfs/wpkg/b
 33:     option transport-type tcp
 34: end-volume
 35: 
 36: volume wpkg-client-5
 37:     type protocol/client
 38:     option remote-host ewcs7
 39:     option remote-subvolume /data/glusterfs/wpkg/b/brick
 40:     option transport-type tcp
 41: end-volume
 42: 
 43: volume wpkg-client-6
 44:     type protocol/client
 45:     option remote-host ewcs2
 46:     option remote-subvolume /var/spool/glusterfs/c_wpkg
 47:     option transport-type tcp
 48: end-volume
 49: 
 50: volume wpkg-client-7
 51:     type protocol/client
 52:     option remote-host ewcs10
 53:     option remote-subvolume /data/glusterfs/wpkg/c
 54:     option transport-type tcp
 55: end-volume
 56: 
 57: volume wpkg-client-8
 58:     type protocol/client
 59:     option remote-host ewcs7
 60:     option remote-subvolume /data/glusterfs/wpkg/c/brick
 61:     option transport-type tcp
 62: end-volume
 63: 
 64: volume wpkg-client-9
 65:     type protocol/client
 66:     option remote-host ewcs2
 67:     option remote-subvolume /data/glusterfs/wpkg/d/brick
 68:     option transport-type tcp
 69: end-volume
 70: 
 71: volume wpkg-client-10
 72:     type protocol/client
 73:     option remote-host ewcs10
 74:     option remote-subvolume /data/glusterfs/wpkg/d
 75:     option transport-type tcp
 76: end-volume
 77: 
 78: volume wpkg-client-11
 79:     type protocol/client
 80:     option remote-host ewcs7
 81:     option remote-subvolume /data/glusterfs/wpkg/d/brick
 82:     option transport-type tcp
 83: end-volume
 84: 
 85: volume wpkg-replicate-0
 86:     type cluster/replicate
 87:     subvolumes wpkg-client-0 wpkg-client-1 wpkg-client-2
 88: end-volume
 89: 
 90: volume wpkg-replicate-1
 91:     type cluster/replicate
 92:     subvolumes wpkg-client-3 wpkg-client-4 wpkg-client-5
 93: end-volume
 94: 
 95: volume wpkg-replicate-2
 96:     type cluster/replicate
 97:     subvolumes wpkg-client-6 wpkg-client-7 wpkg-client-8
 98: end-volume
 99: 
100: volume wpkg-replicate-3
101:     type cluster/replicate
102:     subvolumes wpkg-client-9 wpkg-client-10 wpkg-client-11
103: end-volume
104: 
105: volume wpkg-dht
106:     type cluster/distribute
107:     subvolumes wpkg-replicate-0 wpkg-replicate-1 wpkg-replicate-2 wpkg-replicate-3
108: end-volume
109: 
110: volume wpkg-write-behind
111:     type performance/write-behind
112:     subvolumes wpkg-dht
113: end-volume
114: 
115: volume wpkg-read-ahead
116:     type performance/read-ahead
117:     subvolumes wpkg-write-behind
118: end-volume
119: 
120: volume wpkg-quick-read
121:     type performance/quick-read
122:     option cache-size 8MB
123:     subvolumes wpkg-read-ahead
124: end-volume
125: 
126: volume wpkg-md-cache
127:     type performance/md-cache
128:     subvolumes wpkg-quick-read
129: end-volume
130: 
131: volume wpkg
132:     type debug/io-stats
133:     option log-level INFO
134:     option latency-measurement off
135:     option count-fop-hits off
136:     subvolumes wpkg-md-cache
137: end-volume


So the it's getting the right graph, but for some reason it's still requesting the port for the former brick.

Comment 2 krishnan parthasarathi 2013-06-20 07:02:20 UTC

Joe,

Could you attach the client logs, where you see PMAP_PORTBYBRICK rpc failure messages? I suspect that the old graph, where the client xlator still refers to the 'old brick', is still 'alive'. IOWs, I believe that graph switch hasn't happened (cleanly). These logs would help me confirm this hunch.

Comment 3 Todd Stansell 2013-07-16 21:53:30 UTC

In a similar case, I tried doing a live upgrade from 3.3.2qa1 to 3.4.0.  One of the changes (which needs actual mention in the release notes, but that's a different issue) is that the brick port number ranges changed.  It appears to me that the fuse clients have no way to pick up port number changes for bricks.  I'm including TRACE level logs from the client:

[2013-07-16 21:44:28.596656] T [rpc-clnt.c:426:rpc_clnt_reconnect] 0-gvhome-client-1: attempting reconnect
[2013-07-16 21:44:28.596733] D [name.c:149:client_fill_address_family] 0-gvhome-client-1: address-family not specified, guessing it to be inet from (remote-host: admin02.mgmt)
[2013-07-16 21:44:28.596765] T [name.c:219:af_inet_client_get_remote_sockaddr] 0-gvhome-client-1: option remote-port missing in volume gvhome-client-1. Defaulting to 24007
[2013-07-16 21:44:28.596793] T [common-utils.c:104:gf_resolve_ip6] 0-resolver: flushing DNS cache
[2013-07-16 21:44:28.596819] T [common-utils.c:111:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: admin02.mgmt
[2013-07-16 21:44:28.603614] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-172.22.4.209 (port-24007) for hostname: admin02.mgmt and port: 24007
[2013-07-16 21:44:28.603678] T [socket.c:370:__socket_nodelay] 0-gvhome-client-1: NODELAY enabled for socket 20
[2013-07-16 21:44:28.603703] T [socket.c:424:__socket_keepalive] 0-gvhome-client-1: Keep-alive enabled for socket 20, interval 2, idle: 20
[2013-07-16 21:44:28.603982] D [socket.c:280:__socket_disconnect] 0-gvhome-client-1: shutdown() returned -1. Transport endpoint is not connected
[2013-07-16 21:44:28.604043] T [rpc-clnt.c:532:rpc_clnt_connection_cleanup] 0-gvhome-client-1: cleaning up state in transport object 0x1df49e0
[2013-07-16 21:44:28.604073] D [socket.c:184:__socket_rwv] 0-gvhome-client-1: EOF from peer 172.22.4.209:24009
[2013-07-16 21:44:28.604100] D [socket.c:1512:__socket_proto_state_machine] 0-gvhome-client-1: reading from socket failed. Error (Transport endpoint is not connected), peer (172.22.4.209:24009)
[2013-07-16 21:44:28.604124] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2013-07-16 21:44:28.604161] T [rpc-clnt.c:532:rpc_clnt_connection_cleanup] 0-gvhome-client-1: cleaning up state in transport object 0x1df49e0

and you can see that it's trying to connect to port 24009 even though the port has changed for admin02.mgmt:

Status of volume: gvhome
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick admin01.mgmt:/brick/home                          24011   Y       3387
Brick admin02.mgmt:/brick/home                          49152   Y       8135
Self-heal Daemon on localhost                           N/A     Y       3431
Self-heal Daemon on admin02.mgmt                        N/A     Y       8154

I would suspect that the same issue would happen if glusterfsd processes were restarted and bricks did not bind to the same port numbers they were before.

Todd

Comment 4 Todd Stansell 2013-07-16 21:56:54 UTC

Oh, and sending the process a HUP to get it to 'reincarnate', it logs the following:

[2013-07-16 21:55:22.335282] I [glusterfsd.c:896:reincarnate] 0-glusterfsd: Fetching the volume file from server...
[2013-07-16 21:55:22.335370] T [rpc-clnt.c:1305:rpc_clnt_record] 0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner:
[2013-07-16 21:55:22.335401] T [rpc-clnt.c:1185:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 84, payload: 20, rpc hdr: 64
[2013-07-16 21:55:22.335471] T [rpc-clnt.c:1504:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x19x Program: GlusterFS Handshake, ProgVers: 2, Proc: 2) to rpc-transport (glusterfs)
[2013-07-16 21:55:22.335936] T [rpc-clnt.c:682:rpc_clnt_reply_init] 0-glusterfs: received rpc message (RPC XID: 0x19x Program: GlusterFS Handshake, ProgVers: 2, Proc: 2) from rpc-transport (glusterfs)
[2013-07-16 21:55:22.336008] I [glusterfsd-mgmt.c:1560:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

Obviously, the volfile hasn't changed, but some details, like the port number for the brick has.

Todd

Note You need to log in before you can comment on or make changes to this bug.