Description of problem: [root@ravi1 glusterfs]# gluster peer probe 10.70.42.252 peer probe: success. [root@ravi1 glusterfs]# gluster peer status Number of Peers: 1 Hostname: 10.70.42.252 Uuid: 53da4d10-fa90-44fa-aeb2-11c306f23d8b State: Peer Rejected (Connected) From usr/local/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log: E [glusterd-utils.c:2372:glusterd_compare_friend_volume] 0-management: Cksums of volume testvol differ. local cksum = 2871551223, remote cksum = 329029812 on peer 10.70.42.252 How reproducible: Always Steps to Reproduce: 1.Create a volume on a node 2.Peer probe a second node 3.Check peer status. Actual results: State: Peer Rejected (Connected) Expected results: State: Peer in Cluster (Connected) Additional info:
REVIEW: http://review.gluster.org/7186 (glusterd: send/receive volinfo->caps during peer probe.) posted (#1) for review on master by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/7186 committed in master by Vijay Bellur (vbellur) ------ commit dec7950d4b0944697e4bb8788cc02de2ac4d8708 Author: Ravishankar N <ravishankar> Date: Wed Mar 5 04:46:50 2014 +0000 glusterd: send/receive volinfo->caps during peer probe. Problem: volinfo->caps was not sent over to newly probed peers, resulting in a 'Peer Rejected' state due to volinfo checksum mismatch. Fix: send/receive volinfo capability when peer probing. Change-Id: I2508d3fc7a6e4aeac9c22dd7fb2d3b362f4c21ff BUG: 1072720 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/7186 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Kaushal M <kaushal> Reviewed-by: Vijay Bellur <vbellur>
Has this made it into 3.5.0beta4? I get this on a distributed volume when adding in the 2nd node: Hostname: 172.16.185.106 Uuid: 41068245-072b-48fe-91ce-249feaea3813 State: Probe Sent to Peer (Connected)
Waiting for this on the 3.4 chain as well. Wanted to translate the above into english for anybody trying to work around this issue. This applies to an upgrade only I believe: New nodes add the following lines to /var/lib/glusterd/{mount}/info op-version=2 client-op-version=2 You will notice that these lines do not appear on the old nodes. This causes a mismatch, and therefore a rejected. Take your nodes down, add the lines (or remove them), and restart.
Hi, this applies to a fresh install setting up a brand new single node volume. When adding in the 2nd node to a distributed setup the peer probe fails too. Rich
when I say "this applies", I mean the problem is still there, no that this fix in Comment 4 works for it :-(
Hi Richard, comments #3 and #4 seem to be a different issue than the one that I fixed. In the fresh install setup you described, when you peer probe, what error are you getting in glusterd logs of the nodes?
Note typo in comment 4... /var/lib/glusterd/{mount}/info should be /var/lib/glusterd/vol/{mount}/info I get the same checksum error if I don't manually edit the info file.
(In reply to Ravishankar N from comment #7 I get this on the initial node: [2014-04-22 12:17:37.821414] I [glusterd-handler.c:918:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 172.16.242.241 24007 [2014-04-22 12:17:37.829156] I [glusterd-handler.c:2931:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 172.16.242.241 (24007) [2014-04-22 12:17:37.835996] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-04-22 12:17:37.836161] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled [2014-04-22 12:17:37.836174] I [socket.c:3576:socket_init] 0-management: using system polling thread [2014-04-22 12:17:37.840074] I [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect returned 0 [2014-04-22 12:17:37.869206] I [glusterd-rpc-ops.c:234:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: c1f2632f-ea5f-467b-a701-0ea29caa153c, host: 172.16.242.241 [2014-04-22 12:17:37.874531] I [glusterd-rpc-ops.c:306:__glusterd_probe_cbk] 0-glusterd: Received resp to probe req and this on the new node trying to join: [2014-04-22 12:17:37.844723] I [glusterd.c:168:glusterd_uuid_generate_save] 0-management: generated UUID: c1f2632f-ea5f-467b-a701-0ea29caa153c [2014-04-22 12:17:37.852461] I [glusterd-handler.c:2346:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: 4a652daa-a614-4034-93af-e7e57f90add8 [2014-04-22 12:17:37.853880] I [glusterd-handler.c:2374:__glusterd_handle_probe_query] 0-glusterd: Unable to find peerinfo for host: 172.16.0.1 (24007) [2014-04-22 12:17:37.855150] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-04-22 12:17:37.855246] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled [2014-04-22 12:17:37.855262] I [socket.c:3576:socket_init] 0-management: using system polling thread [2014-04-22 12:17:37.868301] I [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect returned 0 [2014-04-22 12:17:37.868498] I [glusterd-handler.c:2398:__glusterd_handle_probe_query] 0-glusterd: Responded to 172.16.0.1, op_ret: 0, op_errno: 0, ret: 0 [2014-04-22 12:17:37.871329] I [glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 4a652daa-a614-4034-93af-e7e57f90add8 [2014-04-22 12:17:37.927078] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-04-22 12:17:37.927374] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled [2014-04-22 12:17:37.927385] I [socket.c:3576:socket_init] 0-management: using system polling thread If I downgrade to 3.4.x everything works just fine. Thanks, Rich
oh, and on the first node, my peer status is this: [DEV root@e5234340b67d11e glusterfs]$ gluster peer status Number of Peers: 1 Hostname: 172.16.242.241 Uuid: c1f2632f-ea5f-467b-a701-0ea29caa153c State: Probe Sent to Peer (Connected) On the new node, my peer status is not available. [PXE root@dde3904ac98711e glusterfs]$ gluster peer status peer status: failed
(In reply to Richard from comment #10) > oh, and on the first node, my peer status is this: > > [DEV root@e5234340b67d11e glusterfs]$ gluster peer status > Number of Peers: 1 > > Hostname: 172.16.242.241 > Uuid: c1f2632f-ea5f-467b-a701-0ea29caa153c > State: Probe Sent to Peer (Connected) > > On the new node, my peer status is not available. > > [PXE root@dde3904ac98711e glusterfs]$ gluster peer status > peer status: failed Tested and found that the problem exists when upgrading from 3.3 to 3.4 as reported by Awktane and needs to be fixed (until that the cause/workaround is in comment #4). But I am not able to recreate this on a 3.4 to 3.5 upgrade. There doesn't seem to be any errors in the glusterd logs as well. If you are able to reproduce this, could you attach the log files of both nodes? It would also help if you could run glusterd in debug mode before probing. (upgrade all nodes to 3.5, `pkill glusterd` on both nodes, run `glusterd -LDEBUG` on the nodes, then do the peer probing).
(In reply to Ravishankar N from comment #11) My scenario was from 3.3 to 3.4. Is yours for 3.5 and therefore I should request a backport?
(In reply to Awktane from comment #12) > (In reply to Ravishankar N from comment #11) > > My scenario was from 3.3 to 3.4. Is yours for 3.5 and therefore I should > request a backport? No,because it is not a backport but a new fix. The problem I faced was due to the "caps" key not being sent to the peers causing checksum mismatch. But to fix the problem faced by you, we need to regenerate the info files with the 'op-version' and 'client-op-version' key value pairs after an upgrade is done.
(In reply to Ravishankar N from comment #13) > (In reply to Awktane from comment #12) > > (In reply to Ravishankar N from comment #11) > > > > My scenario was from 3.3 to 3.4. Is yours for 3.5 and therefore I should > > request a backport? > > No,because it is not a backport but a new fix. The problem I faced was due > to the "caps" key not being sent to the peers causing checksum mismatch. But > to fix the problem faced by you, we need to regenerate the info files with > the 'op-version' and 'client-op-version' key value pairs after an upgrade is > done. So shall I resubmit as new bug then?
(In reply to Awktane from comment #14) > (In reply to Ravishankar N from comment #13) > > (In reply to Awktane from comment #12) > > > (In reply to Ravishankar N from comment #11) > > > > > > My scenario was from 3.3 to 3.4. Is yours for 3.5 and therefore I should > > > request a backport? > > > > No,because it is not a backport but a new fix. The problem I faced was due > > to the "caps" key not being sent to the peers causing checksum mismatch. But > > to fix the problem faced by you, we need to regenerate the info files with > > the 'op-version' and 'client-op-version' key value pairs after an upgrade is > > done. > > So shall I resubmit as new bug then? Sure :)
(In reply to Ravishankar N from comment #15) > (In reply to Awktane from comment #14) > > (In reply to Ravishankar N from comment #13) > > > (In reply to Awktane from comment #12) > > > > (In reply to Ravishankar N from comment #11) > > > > > > > > My scenario was from 3.3 to 3.4. Is yours for 3.5 and therefore I should > > > > request a backport? > > > > > > No,because it is not a backport but a new fix. The problem I faced was due > > > to the "caps" key not being sent to the peers causing checksum mismatch. But > > > to fix the problem faced by you, we need to regenerate the info files with > > > the 'op-version' and 'client-op-version' key value pairs after an upgrade is > > > done. > > > > So shall I resubmit as new bug then? > Sure :) Done. Note that my issue was unrelated to this one. Split to https://bugzilla.redhat.com/show_bug.cgi?id=1090298
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
Hi Thank you for the update, once it appears in the QA folder here, I will be able to test: http://download.gluster.org/pub/gluster/glusterfs/qa-releases/ Thanks, Rich
Hi, This beta release has resolved the problem I was having with peer probes... and thrown up a new one. noatime is nolonger a supported mount option. Thanks, Rich
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users