Description of problem: Unable to upgrade gluster cluster to 3.12 version after 3.8 version Version-Release number of selected component (if applicable): old one - 3.8, new one - 3.12 How reproducible: Always Steps to Reproduce: 1. Install 3 node gluster cluster using 3.8 version. I used binaries, shipped with node-ng version 4.1.6-20170921 2. Upgrade one of those nodes to 3.12. I upgraded my node-ng to 4.2beta1 latest tested 3. Newly upgraded node will be rejected from a gluster cluster. Actual results: Node is rejected from cluster Expected results: Node must be accepted Additional info: Here is the log from the new node: [2017-11-09 05:15:08.481680] I [MSGID: 106163] [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-11-09 05:15:08.489219] I [MSGID: 106490] [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 6b2193c1-63bd-408b-961e-51e01d e486b7 [2017-11-09 05:15:11.165116] E [MSGID: 106010] [glusterd-utils.c:2938:glusterd_compare_friend_volume] 0-management: Version of Cksums data differ. local cksum = 1799370953, remote cksum = 3144964316 on peer 172.19.11.7 [2017-11-09 05:15:11.165328] I [MSGID: 106493] [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 172.19.11.7 (0), ret: 0, op_ret: -1 [2017-11-09 05:15:11.175332] I [MSGID: 106493] [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 6b2193c1-63bd-408b-961e-51e01de486b7, host: 1
After upgrading have you ensured that you have bumped up the op-version? If no, please do the same and then restart glusterd service of all the nodes to see if they can get accepted in the cluster. If the above is ensured and still the issue persists, can you please share the following file from all the nodes? cat /var/lib/glusterd/vols/remote/info
No, i didn't as didn't finished upgrade of my cluster. All nodes was still 3.8, while just a single node became 3.12
Please provide the following: Output of cat /var/lib/glusterd/vols/remote/info from 172.19.11.7 & the node where the peer got rejected i.e. the new node from where you have attached the log.
Sorry, I can't reproduce this on a clean environment and the one where we found this bug was already rebuilt. Basically the steps I did were: 1) set up a HyperConverged oVirt 4.1 environment with 3 NGN hosts, create some VMs and let them run for a few weeks 2) upgrade one host to 4.2 beta Not sure how we can proceed here without a stable reproducer, but just to clarify - what needs to mismatch to get the "Version of Cksums data differ"? What file is checksummed and how is this checksum computed? Here's the code that seems to be responsible: [1] https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-utils.c#L3386
With out the info file from the two nodes where the mismatch happens (as mentioned in comment 3), unfortunately we won't be able to debug this further. If you happen to reproduce this again please reopen this bug.