+++ This bug was initially created as a clone of Bug #1544600 +++ +++ This bug was initially created as a clone of Bug #1544461 +++ Description of problem: Unable to upgrade Gluster cluster to 3.10.10 version after 3.8.15 version ( same for 3.12 & 3.13 i think is related to https://bugzilla.redhat.com/show_bug.cgi?id=1511903 ) Version-Release number of selected component (if applicable): old one 3.8.15 , new one 3.10.10 How reproducible: Always (also tried with 3.12 and 3.13) Steps to Reproduce: 1. Install 3.10.10 on Ubuntu 14 from PPA. 2. Upgrade one of those nodes latest 3.10 ( now 3.10.10) 3. Newly upgraded node will be rejected from a gluster cluster. Actual results: Node is rejected from cluster Expected results: Node must be accepted Additional info: I have a 5 x replicated on Ubuntu 14. I am trying to update GlusterFS. First i was at 3.7 version from which i tried multiple scenarios and all failed while directly trying with the newer GlusterFS versions (3.10 3.12 3.13). I then noticed that 3.8 is working fine so i updated from 3.7.20 to 3.8.15 as an intermediary version. While trying to update ( i only updated 1/5 servers to 3.10.10 while the rest are at 3.8.15) to the next 3.10 LTM the node which was updated is throwing following error: "Version of Cksums gluster_volume differ. local cksum = 3272345312, remote cksum = 469010668 on peer 1-gls-dus21-ci-efood-real-de.openstacklocal" Also all peers are now in "Peer Rejected (Connected)" state after update. Volume Name: gluster_volume Type: Replicate Volume ID: 2e6bd6ba-37c8-4808-9156-08545cea3e3e Status: Started Snapshot Count: 0 Number of Bricks: 1 x 5 = 5 Transport-type: tcp Bricks: Brick1: 2-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb Brick2: 1-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb Brick3: 1-gls-dus21-ci-efood-real-de:/export_vdb Brick4: 3-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb Brick5: 2-gls-dus21-ci-efood-real-de.openstacklocal:/export_vdb Options Reconfigured: features.barrier: off performance.readdir-ahead: on auth.allow: 10.96.213.245,10.96.214.101,10.97.177.132,10.97.177.127,10.96.214.93,10.97.177.139,10.96.214.119,10.97.177.106,10.96.210.69,10.96.214.94,10.97.177.118,10.97.177.128,10.96.214.98 nfs.disable: on performance.cache-size: 2GB performance.cache-max-file-size: 1MB cluster.self-heal-window-size: 64 performance.io-thread-count: 32 root@1-gls-dus21-ci-efood-real-de:/home/ubuntu# gluster peer status Number of Peers: 4 Hostname: 3-gls-dus10-ci-efood-real-de.openstack.local Uuid: 3d141235-9b93-4798-8e03-82a758216b0b State: Peer in Cluster (Connected) Hostname: 1-gls-dus10-ci-efood-real-de.openstack.local Uuid: 00839049-2ade-48f8-b5f3-66db0e2b9377 State: Peer in Cluster (Connected) Hostname: 2-gls-dus10-ci-efood-real-de.openstack.local Uuid: 1617cd54-9b2a-439e-9aa6-30d4ecf303f8 State: Peer in Cluster (Connected) Hostname: 2-gls-dus21-ci-efood-real-de.openstacklocal Uuid: 0c698b11-9078-441a-9e7f-442befeef7a9 State: Peer Rejected (Connected) Volume status from one of which was not updated: root@1-gls-dus21-ci-efood-real-de:/home/ubuntu# gluster volume status Status of volume: gluster_volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 2-gls-dus10-ci-efood-real-de.openstac k.local:/export_vdb 49153 0 Y 30521 Brick 1-gls-dus10-ci-efood-real-de.openstac k.local:/export_vdb 49152 0 Y 23166 Brick 1-gls-dus21-ci-efood-real-de:/export_ vdb 49153 0 Y 2322 Brick 3-gls-dus10-ci-efood-real-de.openstac k.local:/export_vdb 49153 0 Y 10854 Self-heal Daemon on localhost N/A N/A Y 4931 Self-heal Daemon on 3-gls-dus10-ci-efood-re al-de.openstack.local N/A N/A Y 16591 Self-heal Daemon on 2-gls-dus10-ci-efood-re al-de.openstack.local N/A N/A Y 4621 Self-heal Daemon on 1-gls-dus10-ci-efood-re al-de.openstack.local N/A N/A Y 3487 Task Status of Volume gluster_volume ------------------------------------------------------------------------------ There are no active volume tasks And from the updated one: root@2-gls-dus21-ci-efood-real-de:/var/log/glusterfs# gluster volume status Status of volume: gluster_volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 2-gls-dus21-ci-efood-real-de.openstac klocal:/export_vdb N/A N/A N N/A NFS Server on localhost N/A N/A N N/A Task Status of Volume gluster_volume ------------------------------------------------------------------------------ There are no active volume tasks [2018-02-12 13:35:53.400122] E [MSGID: 106010] [glusterd-utils.c:3043:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_volume differ. local cksum = 3272345312, remote cksum = 469010668 on peer 1-gls-dus10-ci-efood-real-de.openstack.local [2018-02-12 13:35:53.400211] I [MSGID: 106493] [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 1-gls-dus10-ci-efood-real-de.openstack.local (0), ret: 0, op_ret: -1 [2018-02-12 13:35:53.417588] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2018-02-12 13:35:53.430748] I [MSGID: 106490] [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 3d141235-9b93-4798-8e03-82a758216b0b [2018-02-12 13:35:53.431024] E [MSGID: 106010] [glusterd-utils.c:3043:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_volume differ. local cksum = 3272345312, remote cksum = 469010668 on peer 3-gls-dus10-ci-efood-real-de.openstack.local [2018-02-12 13:35:53.431121] I [MSGID: 106493] [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 3-gls-dus10-ci-efood-real-de.openstack.local (0), ret: 0, op_ret: -1 [2018-02-12 13:35:53.473344] I [MSGID: 106493] [glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 7488286f-6bfa-46f8-bc50-9ee815e96c66, host: 1-gls-dus21-ci-efood-real-de.openstacklocal, port: 0 I do no have this file on any of the servers: `/var/lib/glusterd/vols/remote/info` but i attached the `/var/lib/glusterd/vols/gluster_volume/info` from the upgraded one and from a server which was not upgraded. The 3.7 version was running fine for quite some time so we can exclude network issue, selinux etc.. --- Additional comment from Marc on 2018-02-12 09:51:24 EST --- I see that on the new node i have the new "tier-enabled=0", could it be also related to this: https://www.spinics.net/lists/gluster-users/msg33329.html. --- Additional comment from Atin Mukherjee on 2018-02-12 10:07:17 EST --- This is indeed a bug and we have managed to root cause it couple of days back. I am assigning it to one of my colleague Hari who is aware of this issue and the fix required. For the time being, please remove tier-enabled=0 in all the info files from the node which has been upgraded and then once all nodes are upgraded bump up the cluster.op-version. @Hari - we need to send this fix to 3.10, 3.12 and 4.0 branch by changing the op-version check to 3.11 instead of 3.7.6. --- Additional comment from Worker Ant on 2018-02-12 21:39:11 EST --- REVIEW: https://review.gluster.org/19552 (glusterd: fix tier-enabled flag op-version check) posted (#1) for review on master by Atin Mukherjee
REVIEW: https://review.gluster.org/19555 (glusterd: fix tier-enabled flag op-version check) posted (#1) for review on release-3.12 by hari gowtham
COMMIT: https://review.gluster.org/19555 committed in release-3.12 by "hari gowtham" <hari.gowtham005> with a commit message- glusterd: fix tier-enabled flag op-version check tier-enabled flag in volinfo structure was introduced in 3.10, however while writing this value to the glusterd store was done with a wrong op-version check which results into volume checksum failure during upgrades. >Change-Id: I4330d0c4594eee19cba42e2cdf49a63f106627d4 >BUG: 1544600 >Signed-off-by: Atin Mukherjee <amukherj> Change-Id: I4330d0c4594eee19cba42e2cdf49a63f106627d4 BUG: 1544637 Signed-off-by: hari gowtham <hgowtham>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.6, please open a new bug report. glusterfs-3.12.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2018-February/033552.html [2] https://www.gluster.org/pipermail/gluster-users/