Description of problem: We have a setup with 2 SL5 servers (t2-test02, t2-test03) with gluster 3.2.5 that have a replicated volume. After upgrading these to 3.3.1 is impossible to add a SL6 server (t2-test04) with gluster 3.3.1. After a "peer probe t2-test04" we get it rejected: [root@t2-test02 ~]# gluster peer status Number of Peers: 2 Hostname: t2-test03 Uuid: 477e485d-448e-4c20-aeea-8362b826e3eb State: Peer in Cluster (Connected) Hostname: t2-test04 Uuid: 26bd1e66-1455-438c-83d4-b6dd13d4389c State: Peer Rejected (Connected) Version-Release number of selected component (if applicable): 3.3.1 Steps to Reproduce: 1. Install server A and B with Scientific Linux 5.7 and install gluster 3.2.5 on them. 2. Peer probe between them and create a replicated volume. 3. Upgrade both to 3.3.1. 4. Install server C with Scientific Linux 6.3 and install gluster 3.3.1. 5. On server A do: "peer probe C". You will get for it "State: Peer Rejected (Connected)" Actual results: State: Peer Rejected (Connected) Expected results: State: Peer in Cluster (Connected) Additional info: If no volume is present the probe is successful. The name resolution is ok. In /var/log/glusterfs/etc-glusterfs-glusterd.vol.log of t2-test02 after the peer probe I get the following (ip address removed): [2013-04-08 17:41:35.507138] I [glusterd-handler.c:685:glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req t2-test04 24007 [2013-04-08 17:41:35.516756] I [glusterd-handler.c:428:glusterd_friend_find] 0-glusterd: Unable to find hostname: t2-test04 [2013-04-08 17:41:35.516780] I [glusterd-handler.c:2245:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: t2-test04 (24007) [2013-04-08 17:41:35.517089] I [rpc-clnt.c:968:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-04-08 17:41:35.520220] I [glusterd-handler.c:2227:glusterd_friend_add] 0-management: connect returned 0 [2013-04-08 17:41:35.521552] I [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using Program glusterd mgmt, Num (1238433), Version (2) [2013-04-08 17:41:35.521580] I [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program Peer mgmt, Num (1238437), Version (2) [2013-04-08 17:41:35.531927] I [glusterd-rpc-ops.c:219:glusterd3_1_probe_cbk] 0-glusterd: Received probe resp from uuid: 26bd1e66-1455-438c-83d4-b6dd13d4389c, host: t2-test04 [2013-04-08 17:41:35.532181] I [glusterd-handler.c:416:glusterd_friend_find] 0-glusterd: Unable to find peer by uuid [2013-04-08 17:41:35.533168] I [glusterd-rpc-ops.c:287:glusterd3_1_probe_cbk] 0-glusterd: Received resp to probe req [2013-04-08 17:41:35.555106] I [glusterd-rpc-ops.c:329:glusterd3_1_friend_add_cbk] 0-glusterd: Received ACC from uuid: 26bd1e66-1455-438c-83d4-b6dd13d4389c, host: t2-test04, port: 0 [2013-04-08 17:41:35.555218] I [glusterd-handler.c:2423:glusterd_xfer_cli_probe_resp] 0-glusterd: Responded to CLI, ret: 0 [2013-04-08 17:41:35.557322] I [glusterd-handler.c:1758:glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: 26bd1e66-1455-438c-83d4-b6dd13d4389c [2013-04-08 17:41:35.557502] I [glusterd-handler.c:1799:glusterd_handle_probe_query] 0-glusterd: Responded to XXX.XXX.XXX.XXX, op_ret: 0, op_errno: 0, ret: 0 [2013-04-08 17:41:35.558289] I [glusterd-handler.c:1486:glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 26bd1e66-1455-438c-83d4-b6dd13d4389c [2013-04-08 17:41:35.558361] E [glusterd-utils.c:1926:glusterd_compare_friend_volume] 0-: Cksums of volume gluster-test differ. local cksum = 1934649064, remote cksum = -1823208808 [2013-04-08 17:41:35.558502] I [glusterd-handler.c:2395:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to XXX.XXX.XXX.XXX (0), ret: 0 [2013-04-08 17:41:45.287974] I [glusterd-handler.c:819:glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
Can you check if the volfiles for volume 'gluster-test', in /var/lib/glusterd/vols/ , are the same on test{02,03,04} (after the probe fails)? If there are any differences, please post them here.
Hi, the files on /var/lib/glusterd/vols/gluster-test are the same on test{02,03}. While some of them differ on test04: [root@t2-test03 gluster-test]# md5sum * md5sum: bricks: Is a directory 5dfe5b652e3539efa37fc1c8801ea66b cksum ea87d872aa0870f358dbd8ee03bd8a5c gluster-test-fuse.vol 086f16c89d72fc639876ef2b5d4876cd gluster-test.t2-test02.root-b1.vol 086f16c89d72fc639876ef2b5d4876cd gluster-test.t2-test03.root-b1.vol 27446113de33c6112d41c4306d24e776 info f1b55d145d2987c1b23c80d3dcc689ed node_state.info 7539d230a861bcba000f71047da6b2b4 rbstate [root@t2-test04 gluster-test]# md5sum * md5sum: bricks: Is a directory 4117326b54141c43d5f9f34fc15334c0 cksum ecdc91a2c769a2867a367946ef2b8897 gluster-test-fuse.vol 086f16c89d72fc639876ef2b5d4876cd gluster-test.t2-test02.root-b1.vol 086f16c89d72fc639876ef2b5d4876cd gluster-test.t2-test03.root-b1.vol 795e55064584312169ebc12f70b0f234 info f1b55d145d2987c1b23c80d3dcc689ed node_state.info 7539d230a861bcba000f71047da6b2b4 rbstate ecdc91a2c769a2867a367946ef2b8897 trusted-gluster-test-fuse.vol Here are the diffs: [root@t2-test04 ~]# diff gluster-test-t2-test03/cksum /var/lib/glusterd/vols/gluster-test/cksum 1c1 < info=600099517 --- > info=2600514254 [root@t2-test04 ~]# diff gluster-test-t2-test03/gluster-test-fuse.vol /var/lib/glusterd/vols/gluster-test/gluster-test-fuse.vol 40,41c40,41 < volume gluster-test-stat-prefetch < type performance/stat-prefetch --- > volume gluster-test-md-cache > type performance/md-cache 49c49 < subvolumes gluster-test-stat-prefetch --- > subvolumes gluster-test-md-cache [root@t2-test04 ~]# diff gluster-test-t2-test03/info /var/lib/glusterd/vols/gluster-test/info 4a5,6 > stripe_count=1 > replica_count=1 Furthermore, only on t2-test04, there's the file trusted-gluster-test-fuse.vol.
Thanks for the update. Gluster v3.3 brings some changes to the volfiles and some options. So, on upgrade from 3.2 to 3.3, these files need to have been regenerated. If the upgrade had been done via rpms, the regeneration should have happened correctly. Since the regeneration hasn't happened, you either did a source install or the rpms you used were faulty (in which case please provide details on the rpms). When you added a new peer to the cluster, it got the volume details from the original peers, but when it saved them to disk it used the newer format. This caused the checksum mismatch, which lead to it being Rejected. To solve this problem, perform step 5 from this article http://vbellur.wordpress.com/2012/05/31/upgrading-to-glusterfs-3-3/ , on the original 2 peers. That should regenerate the volfiles and solve your problems.
Hi Kaushal, thank you! I made the procedure you suggested and yes, now the probe was successful. The repo file in the servers is configured to take RPMs from here: http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/EPEL.repo/epel-5/x86_64/ Is it the right one? Anyway a second problem came out. After adding to the existing distributed volume a brick from the new server, all the files disappeared from gluster. If I do an ls I can't see anything, although I can read/write the files giving the exact path and the files are into the bricks. I also tried to restart the gluster daemon and to restart the server but nothing changed... Thanks, Ivano
Hi Ivano, It seems like there are problems with the rpms. We'll need to check the rpms to see what caused it. Your new problem shouldn't be happening. Can you give more information (what you did after the upgrade basically) and logs of both servers and the client. I'd suggest that you file a new bug, if this turns out to be a valid issue. Regards, Kaushal
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.