When quick-read and replicate are both loaded, the following sequence of events can happen: 1) Server2 goes down. 2) A bunch of files are created. 3) Server2 comes back up. 4) Entry self-heal is triggered on the directory and all the empty files are created. 5) "cat" happens on one of the files, and replicate triggers self-heal but unwinds the lookup before the self-heal is complete (background self-heal). This lookup returns the xattr to quick-read, which is ofcourse empty. quick-read caches this for eternity and keeps serving the empty file.
Please please (pretty please :-) just add all the caching types - lookup, quick-read, stat-prefetch - into io-cache and selectable by options, or this sort of thing is going to crop up all the time...
Ian, Having a unified caching translator is an option we are currently exploring.
PATCH: http://patches.gluster.com/patch/3138 in master (cluster/afr: Send xattr in lookup from the source subvolume.)
PATCH: http://patches.gluster.com/patch/3137 in release-3.0 (cluster/afr: Send xattr in lookup from the source subvolume.)
This patch is reverted in release-3.0 branch. Need to have a proper NULL check.
*** Bug 869 has been marked as a duplicate of this bug. ***
Created attachment 206 [details] Makefile showing overrides needed
A patch is sent for review, tested with the attached program (test.c) as a unit test (can be used in automation). Also the same test can be used to trigger an entry self heal + data self heal to verify that case as well.
PATCH: http://patches.gluster.com/patch/3296 in release-3.0 (cluster/afr: Check before accessing xattrs in data self heal.)
Need to check if triggering entry self heal results in quick read storing the file as a zero byte file because we do not set xattrs in sh_missing_entries_lookup_cbk and self heal runs in the background.
Entry self-heal is not a problem because entry self-heal is done on directories. Quick-read will have no clue about files inside that directory when entry self-heal happens. The missing entries lookup is triggered inside replicate and quick-read will never see it.
PATCH: http://patches.gluster.com/patch/3358 in master (cluster/afr: Check before accessing xattrs in data self heal.)
Checked with glusterfs-3.0.5rc6 and glusterfs-3.0.4 clients both mounted. Brought 2nd server down. Executed the unit test attached for some time (which keeps writing "hello, world" to a file). stopped the execution, brought server2 up. And did "cat <file written by the test>" on both the clients. On 3.0.4 it was empty where as on 3.0.5rc6 it was working properly and the contents of the file were visible. Hence moving to the verified state.
*** Bug 1056 has been marked as a duplicate of this bug. ***