+++ This bug was initially created as a clone of Bug #865825 +++ The afr_sh_has_*_pending pending functions, which were clearly copied from one another, all have the problem that if they get an error from dict_get_ptr they return immediately. This means they won't check the pending counts for other peers, which should trigger self-heal if they're non-zero. Failing to trigger self-heal when we should makes this a high-priority issue.
*** Bug 867364 has been marked as a duplicate of this bug. ***
Verified the fix on the build: ============================== glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42 Test Case: http://review.gluster.org/#/c/4070/4/tests/bugs/bug-865825.t ======================================================================== 1. Create a 1 x 3 replicate volume 2. Set the following volume options "write-behind" "off" "io-cache" "off" "self-heal-daemon" "off" "io-cache" "off" "background-self-heal-count" "0" 3. Start the volume 4. Create fuse/nfs mount 5. Create a file from one of the mount point: echo "Testing_bug_867360" > test_file 6. Unmount the mount 7. ## Mess with the flags as though brick-0 accuses brick-2 , brick-1 is missing its brick-2 changelog altogether. setfattr -n trusted.afr.<volume_name>-client-2 -v "0x000000010000000000000000" <brick0_abs_path>/test_file setfattr -x trusted.afr.<volume_name>-client-2 <brick1_abs_path>/test_file echo "wrong_data" > <brick2_abs_path>/test_file 8. Create fuse/nfs mount 9. From fuse/nfs mount execute: "stat test_file" 10. Check the contents of the file "test_file" from all the bricks. { expect : "Testing_bug_867360" } The case is executed on both fuse and nfs mount. The bug is fixed. Moving the bug to assigned state.
(In reply to spandura from comment #4) > Verified the fix on the build: > ============================== > glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42 > > The case is executed on both fuse and nfs mount. The bug is fixed. Moving > the bug to assigned state. Moving the bug to verified state. { Typo Error : assigned -> verified}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html