Hide Forgot
Steps to reproduce: two server nodes 'gluster volume info" Volume Name: repl Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.1.10.112:/sdb Brick2: 10.1.10.113:/sdc One of the node goes down and we reconfigure ping-timeout on 10.1.10.113 to 5 using below command line gluster volume set repl network.ping-timeout 5 Once the node 10.1.10.112 comes back up its peer status is in "Rejected" since the volumes are not matching between both nodes. What we do is on 10.1.10.112 stop the volume and delete the contents, then initiate a volume sync. NOTE: this volume stop and delete doesn't take effect on 10.1.10.113 since both peers are in rejected mode. But interesting part is when you do "gluster volume sync 10.1.10.113 repl" from 10.1.10.112 It syncs even the current volume state from the 10.1.10.113 which is actually started, but it fails to start its own. [root@compel2 ~]# gluster volume sync 10.1.10.113 volume sync: successful [root@compel2 ~]# gluster volume info Volume Name: repl Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.1.10.112:/sdb Brick2: 10.1.10.113:/sdc Options Reconfigured: network.ping-timeout: 5 [root@compel2 ~]# ps -ef | grep gluster root 3713 1 0 10:01 ? 00:00:00 /usr/sbin/glusterd root 3989 3740 0 10:08 pts/0 00:00:00 grep gluster [root@compel2 ~]# This puts me into jinx!!
Taking this task after 3.1.3 release, hence this should be present mostly in 3.2.x (or 3.1.3+ releases).
We found that glusterd store version is not updated for some of the operations like set, reset, start, stop. In remodeling glusterd store for fix to bug 763486, we fixed this. With this fix this problem should happen only when any operation is performed on volume with same name on *BOTH* peer1, peer2 when they are not connected to each other.
http://review.gluster.org/4188