872601 – split-brain caused by %preun% script if server rpm is upgraded during self-heal

Bug 872601 - split-brain caused by %preun% script if server rpm is upgraded during self-heal

Summary: split-brain caused by %preun% script if server rpm is upgraded during self-heal

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	build
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	878882
TreeView+	depends on / blocked

Reported:	2012-11-02 14:17 UTC by Joe Julian
Modified:	2014-12-14 19:40 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Clones:	878882 (view as bug list)
Environment:
Last Closed:	2014-12-14 19:40:29 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Joe Julian 2012-11-02 14:17:16 UTC

Description of problem:
During an rpm upgrade of glusterfs-server, preun will run "/sbin/service glusterfsd condrestart". This, of course, succeeds in killing all glusterfsd instances as status reports that glusterfsd is running so stop kills them all and start does nothing since this isn't a legacy configuration. Normally this is a desired effect as it allows the brick instances to load the new version when glusterd is next restarted.

If, however, we just upgraded one server in a replica set and the self-heal hasn't completed, upgrading the next server will cause a split-brain as the stale files on the first server will be the only files available to afr and will be updated making the files on the second server to also be considered stale.

How reproducible:
As long as there's disk activity on the same file during both brick restarts, and that file is large enough to not complete the self heal in time, always.

Expected results:
Perhaps a check should be done to ensure a clean self-heal state before doing the condrestart in preun or in the init/systemctl scripts.

Comment 1 Amar Tumballi 2012-11-22 10:58:45 UTC

need some action from replicate part too.

Comment 2 Pranith Kumar K 2013-04-04 08:30:14 UTC

We need to implement a command which can tell if any of the files need self-heal or not.

Comment 3 Anand Avati 2013-10-25 11:34:38 UTC

REVIEW: http://review.gluster.org/6145 (cluster/afr: Provide setxattr interface for triggering heal) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2013-10-30 08:30:20 UTC

REVIEW: http://review.gluster.org/6145 (cluster/afr: Provide setxattr interface for triggering heal) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2013-10-30 08:30:29 UTC

REVIEW: http://review.gluster.org/6195 (extras/scripts: Script to self-heal in a synchronous way) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Niels de Vos 2014-11-27 14:54:03 UTC

The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.