Description of problem:
Found the situation, where stopped the existing geo-replication session and upgraded each master node one by one. Upgraded nodes needed reboot since, they upgraded the kernel version.
Had 3 master nodes: <n1,n2 and n3> all with 3.1.2 bits and geo-replication in stopped state.
1. Upgrade n1 to 3.1.3 and do not reboot
2. Check geo-rep session, it is in stopped state.
3. Upgrade n2 to 3.1.3 and reboot
4. Check geo-rep session, it is in stoppped state.
5. Upgrade n3 to 3.1.3 and reboot
6. Check geo-rep session, it is in stopped state.
7. reboot n1
7. Check geo-rep session, it is in started state.
Expected is to be in stopped state.
Version-Release number of selected component (if applicable):
I have tested reboot scenarios as part of 3.1.3 and as mentioned above, the stopped state remains in stop state until all gets rebooted once. I will try to narrow down the use case without upgrade.
patch posted in upstream:
Upstream mainline : http://review.gluster.org/14830
Upstream 3.8 : http://review.gluster.org/15196
And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Upgraded from 3.1.2 to 3.2.0 (glusterfs-geo-replication-3.8.4-18.el7rhgs.x86_64). Geo-rep sessions remains in stopped state after rebooting all nodes in cluster. Moving this bug to verified state.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.