863223 – Changelogs are cleared prematurely for replica 3

Bug 863223 - Changelogs are cleared prematurely for replica 3

Summary: Changelogs are cleared prematurely for replica 3

Keywords:
Status:	CLOSED DUPLICATE of bug 802417
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Jeff Darcy
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	847645
TreeView+	depends on / blocked

Reported:	2012-10-04 18:05 UTC by Jeff Darcy
Modified:	2013-01-03 13:29 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-01-03 13:29:16 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Darcy 2012-10-04 18:05:41 UTC

Steps to reproduce:

(1) Set up a volume with replica 3.  For this example, the bricks are as follows:

    rep3-client-0 is on gfs1
    rep3-client-1 is on gfs2
    rep3-client-2 is on gfs4

(2) Start the volume and mount a client.

(3) Kill all GlusterFS processes on two of the nodes (e.g. "killall -r -9 gluster" on gfs1 and gfs2).

(4) Create a subdirectory and write a file within it.

The changelogs for the file itself on the surviving node (gfs4) are as follows:

    trusted.afr.rep3-client-0=0x000000010000000000000000
    trusted.afr.rep3-client-1=0x000000010000000000000000
    trusted.afr.rep3-client-2=0x000000000000000000000000

For the directory, they are:

    trusted.afr.rep3-client-0=0x000000000000000000000001
    trusted.afr.rep3-client-1=0x000000000000000000000001
    trusted.afr.rep3-client-2=0x000000000000000000000000

(5) Bring glusterd back up on *one* of the other nodes (gfs1).

Now the changelogs for the file and directory on gfs4 are like this:

    # file: export/sdc/dir
    trusted.afr.rep3-client-0=0x000000000000000000000000
    trusted.afr.rep3-client-1=0x000000000000000000000000
    trusted.afr.rep3-client-2=0x000000000000000000000000
    trusted.gfid=0x93778dc8eabd4e5fbcde9f116bb0452e
    # file: export/sdc/dir/sub
    trusted.afr.rep3-client-0=0x000000000000000000000000
    trusted.afr.rep3-client-1=0x000000000000000000000000
    trusted.afr.rep3-client-2=0x000000000000000000000000

The changelogs on gfs1 are also zero.  The file contents are correct, so gfs1 has been fully healed, but nothing has happened on gfs2 and there's no longer any indication that anything should.

(6) Bring glusterd back up on the other node (gfs2).

Nothing.

(7) "find . | stat" on the client.

At this point the directory is created on gfs2 (entry self-heal on the volume root) and the file is created within it (entry self-heal on the directory), but the file is zero length and has no AFR xattrs.  Somewhat surprisingly, we find this on gfs1.

# file: export/sdc/dir/sub
trusted.afr.rep3-client-1=0x000000010000000100000000

(8) Do the "find . | stat" on gfs4 a second time.

No effect.

(9) Unmount and remount on gfs4, then do the "find . | stat" a third time.

Now the contents are correct.  This is the same "no data self-heal on client without remount" behavior as observed in bug862838, but that's not the issue here.  Our problems started back at step 5, when the changelogs for afr-client-1 (gfs2) were cleared even though no self-heal had actually occurred there.  We need to fix that first so that proactive self-heal can work with replica 3, before we worry about client-side self-heal.

Comment 1 Pranith Kumar K 2012-10-05 06:12:42 UTC


*** This bug has been marked as a duplicate of bug 802417 ***

Comment 2 Jeff Darcy 2012-10-05 13:08:35 UTC

No, this is not a duplicate of 802417.  That bug has to do with GFID mismatches on re-created files leading to EIO.  This bug has to do with data/changelog discrepancies even on existing files leading to stale data.  The fix for this bug does not address the other, and I doubt that the converse would be true either.

Comment 3 Jeff Darcy 2012-12-20 15:05:34 UTC

http://review.gluster.org/#change,4034

Comment 4 Jeff Darcy 2013-01-03 13:29:16 UTC

I still don't believe this is a true duplicate of bug 802417, but it's close enough.  Marking the duplicate in this direction because preserving the history and follower list on the user-reported bug is more important than doing the same for this one.

*** This bug has been marked as a duplicate of bug 802417 ***

Note You need to log in before you can comment on or make changes to this bug.