+++ This bug was initially created as a clone of Bug #839397 +++ +++ This bug was initially created as a clone of Bug #834229 +++ Created attachment 593397 [details] glusterd logs Description of problem: Version-Release number of selected component (if applicable): The behavior of probing new machine after the migration is strange. Following is the sequence. 1. Migration is successful from GVSA instances to RHS instances. 2. Trying to probe from the RHS instance to new RHS instance. 3. It doesn't say successful or unsuccessful. It returns to prompt after some time. 4. Peer status from the Old RHS machine says the "Establishing Connection (Connected)" and the UUID is mentioned as "d30bee2c-4fd1-4662-a01b-ae2ac3fb1831" 5. But the peer status from the new RHS machine reports UUID's to all zero "00000000-0000-0000-0000-000000000000" and it says "connected to peer (connected)" 6. Restarting the glusterd on old rhs machine displays the UUID's to all "00000000-0000-0000-0000-000000000000" from earlier "d30bee2c-4fd1-4662-a01b-ae2ac3fb1831". How reproducible: Steps to Reproduce: 1. Migrate from GVSA to RHS 2. peer probe new machine Actual results: Console Output of peer status: ============================= Old RHS Machine =============== [root@ip-10-138-30-187 ~]# gluster peer status Number of Peers: 4 Hostname: ec2-54-251-62-150.ap-southeast-1.compute.amazonaws.com Uuid: 9bb0d8c4-538a-f4e8-db66-36a53b213da9 State: Peer in Cluster (Connected) Hostname: ec2-54-251-62-152.ap-southeast-1.compute.amazonaws.com Uuid: 2fad5c20-3d37-5d3a-cffc-b4355cc83ff1 State: Peer in Cluster (Connected) Hostname: ec2-54-251-60-39.ap-southeast-1.compute.amazonaws.com Uuid: 39effae6-1457-dfcd-8bd2-c7cf3d940dde State: Peer in Cluster (Connected) Hostname: ec2-46-137-231-143.ap-southeast-1.compute.amazonaws.com Uuid: d30bee2c-4fd1-4662-a01b-ae2ac3fb1831 State: Establishing Connection (Connected) [root@ip-10-138-30-187 ~]# New RHS Machine =============== [root@ip-10-138-109-140 ~]# gluster peer status Number of Peers: 1 Hostname: 10.138.30.187 Uuid: 00000000-0000-0000-0000-000000000000 State: Connected to Peer (Connected) [root@ip-10-138-109-140 ~]# Snippet from "etc-glusterfs-glusterd.vol.log" ============================================= [2012-06-21 08:32:04.744455] I [glusterd-handler.c:423:glusterd_friend_find] 0-glusterd: Unable to find hostname: ec2-46-137-231-143.ap-southeast-1.compute.amazonaws.com [2012-06-21 08:32:04.744506] I [glusterd-handler.c:2222:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: ec2-46-137-231-143.ap-southeast-1.compute.amazonaws.com (24007) [2012-06-21 08:32:04.748926] I [glusterd-handler.c:2204:glusterd_friend_add] 0-management: connect returned 0 [2012-06-21 08:32:04.749745] I [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using Program glusterd mgmt, Num (1238433), Version (2) [2012-06-21 08:32:04.749774] I [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program Peer mgmt, Num (1238437), Version (2) [2012-06-21 08:32:04.756568] I [glusterd-rpc-ops.c:218:glusterd3_1_probe_cbk] 0-glusterd: Received probe resp from uuid: d30bee2c-4fd1-4662-a01b-ae2ac3fb1831, host: ec2-46-137-231-143.ap-southeast-1.compute.amazonaws.com [2012-06-21 08:32:04.756608] I [glusterd-handler.c:411:glusterd_friend_find] 0-glusterd: Unable to find peer by uuid [2012-06-21 08:32:04.756667] E [glusterd-sm.c:1022:glusterd_friend_sm] 0-glusterd: handler returned: -1 Additional info: --- Additional comment from kaushal on 2012-06-21 08:15:18 EDT --- Rahul, Can you provide the full logs from both the servers. I'd like to look into it in more detail. --- Additional comment from rhinduja on 2012-06-22 02:20:11 EDT --- Hi Kaushal, As discussed please find the log snippet: [2012-06-22 05:57:04.746347] I [glusterd-rpc-ops.c:218:glusterd3_1_probe_cbk] 0-glusterd: Received probe resp from uuid: d30bee2c-4fd1-4662-a01b-ae2ac3fb1831, host: ec2-46-137-231-143.ap-southeast-1.compute.amazonaws.com [2012-06-22 05:57:04.746398] D [glusterd-utils.c:4063:glusterd_friend_find_by_uuid] 0-glusterd: Friend with uuid: d30bee2c-4fd1-4662-a01b-ae2ac3fb1831, not found [2012-06-22 05:57:04.746448] I [glusterd-handler.c:411:glusterd_friend_find] 0-glusterd: Unable to find peer by uuid [2012-06-22 05:57:04.746466] D [glusterd-utils.c:4100:glusterd_friend_find_by_hostname] 0-management: Friend ec2-46-137-231-143.ap-southeast-1.compute.amazonaws.com found.. state: 0 [2012-06-22 05:57:04.746482] D [glusterd-sm.c:949:glusterd_friend_sm_inject_event] 0-glusterd: Enqueue event: 'GD_FRIEND_EVENT_INIT_FRIEND_REQ' [2012-06-22 05:57:04.746495] D [glusterd-sm.c:1004:glusterd_friend_sm] 0-: Dequeued event of type: 'GD_FRIEND_EVENT_INIT_FRIEND_REQ' [2012-06-22 05:57:04.746564] D [glusterd-utils.c:1823:glusterd_add_volume_to_dict] 0-: Returning with -1 [2012-06-22 05:57:04.746586] D [glusterd-utils.c:1858:glusterd_build_volume_dict] 0-: Returning with -1 [2012-06-22 05:57:04.746602] D [glusterd-rpc-ops.c:1513:glusterd3_1_friend_add] 0-glusterd: Returning -1 [2012-06-22 05:57:04.746615] D [glusterd-sm.c:302:glusterd_ac_friend_add] 0-: Returning with -1 [2012-06-22 05:57:04.746627] E [glusterd-sm.c:1022:glusterd_friend_sm] 0-glusterd: handler returned: -1 [2012-06-22 05:57:04.746656] I [glusterd-rpc-ops.c:286:glusterd3_1_probe_cbk] 0-glusterd: Received resp to probe req [2012-06-22 05:59:04.729093] D [socket.c:184:__socket_rwv] 0-socket.management: EOF from peer 127.0.0.1:1022 [2012-06-22 05:59:04.729138] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-06-22 06:00:17.190236] I [glusterd-handler.c:813:glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2012-06-22 06:00:17.191502] D [socket.c:184:__socket_rwv] 0-socket.management: EOF from peer 127.0.0.1:1022 [2012-06-22 06:00:17.191530] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-06-22 06:01:24.167230] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3c33ae5ccd] (-->/lib64/libpthread.so.0() [0x3c342077f1] (-->glusterd(glusterfs_sigwaiter+0xdd) [0x405cfd]))) 0-: received signum (15), shutting down --- Additional comment from kaushal on 2012-06-22 03:12:31 EDT --- Okay, tracked this down to the changes done in regard with username/password authentication in 3.3 . And this problem should be occur for other pre-3.3 to 3.3 migrations done using similar steps. The steps which to my knowledge are as follows, 1) Bring down 3.2 cluster 2) Copy config dir (/etc/glusterd/*) of each peer to a safe place 3) Install 3.3 4) Copy back the saved config to /var/lib/glusterd The reason peer probe is failing, glusterd is failing to build the volume dictionary, which needs to be sent to the new peer, when volinfo.auth.{username,password} are missing. An easy fix will be to prevent failure when these are not present and just continue with building the rest of the dictionary. But this wouldn't be correct. The main problem here is that, when the volinfos are created when glusterd starts, volinfo.auth.{username,password} are only filled in if they are present in the info file. Since these two values are not present in pre-3.3 versions of gluster, they are not in the info files. This leads to the fields being empty for volumes migrated from pre-3.3 to 3.3 . The solution here is to generate these values when not found, similar to the backward compatibility measures used for other volinfo changes. --- Additional comment from kaushal on 2012-07-04 02:25:15 EDT --- Review http://review.gluster.com/3619 (glusterd: Fix peer probe when username/password is missing) fixes this issue on master.
Assigning the bug to Kaushal as he has fixed this on master.
Fix already accepted upstream in the master branch. The commit-id is b583363 (glusterd: Fix peer probe when username/password is missing) reviewed at http://review.gluster.com/3619
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html