+++ This bug was initially created as a clone of Bug #1684029 +++ Description of problem: While trying to upgrade from older versions like 3.12, 4.1 and 5 to gluster 6 RC, the upgrade ends in peer rejected on one node after other. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. create a replica 3 on older versions (3, 4, or 5) 2. kill the gluster process on one node and install gluster 6 3. start glusterd Actual results: the new version gets peer rejected. and the brick processes or not started by glusterd. Expected results: peer reject should not happen. Cluster should be healthy. Additional info: Status shows the bricks on that particular node alone with N/A as status. Other nodes aren't visible. Looks like a volfile mismatch. The new volfile has "option transport.socket.ssl-enabled off" added while the old volfile misses it. The order of quick-read and open-behind are changed in the old and new versions. These changes cause the volfile mismatch and mess the cluster. --- Additional comment from Sanju on 2019-02-28 17:25:57 IST --- The peers are running inro rejected state because there is a mismatch in the volfiles. Differences are: 1. Newer volfiles are having "option transport.socket.ssl-enabled off" where older volfiles are not having this option. 2. order of quick-read and open-behind are changed commit 4e0fab4 introduced this issue. previously we didn't had any default value for the option transport.socket.ssl-enabled. So this option was not captured in the volfile. with the above commit, we are adding a default value. So this is getting captured in volfile. commit 4e0fab4 has a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1651059. I feel this commit has less significance, we can revert this change. If we do so, we are out of 1st problem. not sure, why the order of quick-read and open-behind are changed. Atin, do let me know your thoughts on proposal of reverting the commit 4e0fab4. Thanks, Sanju --- Additional comment from Sanju on 2019-03-04 14:58:55 IST --- Root cause: Commit 5a152a changed the mechanism of computing the checksum. Because of this change, in heterogeneous cluster, glusterd in upgraded node follows new mechanism for computing the cksum and non-upgraded nodes follow old mechanism for computing the cksum. So the cksum in upgraded node doesn't match with non-upgraded nodes which results in peer rejection issue. Thanks, Sanju
REVIEW: https://review.gluster.org/22297 (core: pass buffer size for computing the cksum) posted (#1) for review on master by Sanju Rakonde
fyi: happens too when upgrading from 5.3 to 5.4
Noticed the same when upgrading from 5.3 to 5.4, as mentioned. I'm confused though. Is actual replication affected, because the 5.4 server and the 3x 5.3 servers still show heal info as all 4 connected, and the files seem to be replicating correctly as well. So what's actually affected - just the status command? Is it fixable by tweaking transport.socket.ssl-enabled? Does upgrading all servers to 5.4 resolve it, or should we revert back to 5.3?
Ended up downgrading to 5.3 just in case. Peer status and volume status are OK now. zypper install --oldpackage glusterfs-5.3-lp150.100.1 Loading repository data... Reading installed packages... Resolving package dependencies... Problem: glusterfs-5.3-lp150.100.1.x86_64 requires libgfapi0 = 5.3, but this requirement cannot be provided not installable providers: libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] Solution 1: Following actions will be done: downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to libgfapi0-5.3-lp150.100.1.x86_64 downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to libgfchangelog0-5.3-lp150.100.1.x86_64 downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to libgfrpc0-5.3-lp150.100.1.x86_64 downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to libgfxdr0-5.3-lp150.100.1.x86_64 downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to libglusterfs0-5.3-lp150.100.1.x86_64 Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by ignoring some of its dependencies Choose from above solutions by number or cancel [1/2/3/c] (c): 1 Resolving dependencies... Resolving package dependencies... The following 6 packages are going to be downgraded: glusterfs libgfapi0 libgfchangelog0 libgfrpc0 libgfxdr0 libglusterfs0 6 packages to downgrade.
REVIEW: https://review.gluster.org/22297 (core: make compute_cksum function op_version compatible) merged (#4) on master by Amar Tumballi
Is the next release going to be an imminent hotfix, i.e. something like today/tomorrow, or are we talking weeks?
REVIEW: https://review.gluster.org/22326 (glusterd: change the op-version) posted (#1) for review on master by Sanju Rakonde
REVIEW: https://review.gluster.org/22326 (glusterd: change the op-version) merged (#2) on master by Atin Mukherjee