Bug 872601 - split-brain caused by %preun% script if server rpm is upgraded during self-heal
split-brain caused by %preun% script if server rpm is upgraded during self-heal
Status: CLOSED DEFERRED
Product: GlusterFS
Classification: Community
Component: build (Show other bugs)
3.3.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Pranith Kumar K
:
Depends On:
Blocks: 878882
  Show dependency treegraph
 
Reported: 2012-11-02 10:17 EDT by Joe Julian
Modified: 2014-12-14 14:40 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 878882 (view as bug list)
Environment:
Last Closed: 2014-12-14 14:40:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe Julian 2012-11-02 10:17:16 EDT
Description of problem:
During an rpm upgrade of glusterfs-server, preun will run "/sbin/service glusterfsd condrestart". This, of course, succeeds in killing all glusterfsd instances as status reports that glusterfsd is running so stop kills them all and start does nothing since this isn't a legacy configuration. Normally this is a desired effect as it allows the brick instances to load the new version when glusterd is next restarted.

If, however, we just upgraded one server in a replica set and the self-heal hasn't completed, upgrading the next server will cause a split-brain as the stale files on the first server will be the only files available to afr and will be updated making the files on the second server to also be considered stale.

How reproducible:
As long as there's disk activity on the same file during both brick restarts, and that file is large enough to not complete the self heal in time, always.

Expected results:
Perhaps a check should be done to ensure a clean self-heal state before doing the condrestart in preun or in the init/systemctl scripts.
Comment 1 Amar Tumballi 2012-11-22 05:58:45 EST
need some action from replicate part too.
Comment 2 Pranith Kumar K 2013-04-04 04:30:14 EDT
We need to implement a command which can tell if any of the files need self-heal or not.
Comment 3 Anand Avati 2013-10-25 07:34:38 EDT
REVIEW: http://review.gluster.org/6145 (cluster/afr: Provide setxattr interface for triggering heal) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)
Comment 4 Anand Avati 2013-10-30 04:30:20 EDT
REVIEW: http://review.gluster.org/6145 (cluster/afr: Provide setxattr interface for triggering heal) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)
Comment 5 Anand Avati 2013-10-30 04:30:29 EDT
REVIEW: http://review.gluster.org/6195 (extras/scripts: Script to self-heal in a synchronous way) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)
Comment 6 Niels de Vos 2014-11-27 09:54:03 EST
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.