Bug 762547 - (GLUSTER-815) quick-read and replicate self-heal interaction result in empty reads
quick-read and replicate self-heal interaction result in empty reads
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
mainline
All All
low Severity medium
: ---
: ---
Assigned To: Pavan Vilas Sondur
:
: GLUSTER-869 GLUSTER-1056 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-08 19:35 EDT by Vikas Gorur
Modified: 2015-12-01 11:45 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
unit test for automation (547 bytes, text/x-csrc)
2010-05-20 23:54 EDT, Pavan Vilas Sondur
no flags Details

  None (edit)
Description Vikas Gorur 2010-04-08 19:35:00 EDT
When quick-read and replicate are both loaded, the following sequence of events can happen:

1) Server2 goes down.
2) A bunch of files are created.
3) Server2 comes back up.
4) Entry self-heal is triggered on the directory and all the empty files are created.
5) "cat" happens on one of the files, and replicate triggers self-heal but unwinds
the lookup before the self-heal is complete (background self-heal). This lookup
returns the xattr to quick-read, which is ofcourse empty. quick-read caches this
for eternity and keeps serving the empty file.
Comment 1 Ian Rogers 2010-04-15 16:36:05 EDT
Please please (pretty please :-) just add all the caching types - lookup, quick-read, stat-prefetch - into io-cache and selectable by options, or this sort of thing is going to crop up all the time...
Comment 2 Vikas Gorur 2010-04-15 16:39:12 EDT
Ian,

Having a unified caching translator is an option we are currently exploring.
Comment 3 Anand Avati 2010-04-20 01:51:05 EDT
PATCH: http://patches.gluster.com/patch/3138 in master (cluster/afr: Send xattr in lookup from the source subvolume.)
Comment 4 Anand Avati 2010-04-20 01:51:22 EDT
PATCH: http://patches.gluster.com/patch/3137 in release-3.0 (cluster/afr: Send xattr in lookup from the source subvolume.)
Comment 5 Amar Tumballi 2010-05-04 04:06:13 EDT
This patch is reverted in release-3.0 branch. Need to have a proper NULL check.
Comment 6 Amar Tumballi 2010-05-04 04:07:00 EDT
*** Bug 869 has been marked as a duplicate of this bug. ***
Comment 7 Pavan Vilas Sondur 2010-05-20 23:54:32 EDT
Created attachment 206 [details]
Makefile showing overrides needed
Comment 8 Pavan Vilas Sondur 2010-05-20 23:55:15 EDT
A patch is sent for review, tested with the attached program (test.c) as a unit test (can be used in automation).
Also the same test can be used to trigger an entry self heal + data self heal to verify that case as well.
Comment 9 Anand Avati 2010-05-21 00:32:34 EDT
PATCH: http://patches.gluster.com/patch/3296 in release-3.0 (cluster/afr: Check before accessing xattrs in data self heal.)
Comment 10 Pavan Vilas Sondur 2010-05-21 14:36:20 EDT
Need to check if triggering entry self heal results in quick read storing the file as a zero byte file because we do not set xattrs in sh_missing_entries_lookup_cbk and self heal runs in the background.
Comment 11 Vikas Gorur 2010-05-21 14:54:19 EDT
Entry self-heal is not a problem because entry self-heal is done on directories. Quick-read will have no clue about files inside that directory when entry self-heal happens. The missing entries lookup is triggered inside replicate and quick-read will never see it.
Comment 12 Anand Avati 2010-06-01 00:24:09 EDT
PATCH: http://patches.gluster.com/patch/3358 in master (cluster/afr: Check before accessing xattrs in data self heal.)
Comment 13 Raghavendra Bhat 2010-06-15 02:04:03 EDT
Checked with glusterfs-3.0.5rc6 and glusterfs-3.0.4 clients both mounted. Brought 2nd server down. Executed the unit test attached for some time (which keeps writing "hello, world" to a file). stopped the execution, brought server2 up. And did "cat <file written by the test>" on both the clients. On 3.0.4 it was empty where as on 3.0.5rc6 it was working properly and the contents of the file were visible. Hence moving to the verified state.
Comment 14 shishir gowda 2010-07-27 02:38:05 EDT
*** Bug 1056 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.