867360 – Self-heal checks skip pending counts that they shouldn't

Bug 867360 - Self-heal checks skip pending counts that they shouldn't

Summary: Self-heal checks skip pending counts that they shouldn't

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Darcy
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	867364 (view as bug list)
Depends On:	865825
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-17 11:30 UTC by Vidya Sakar
Modified:	2013-09-23 22:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	865825
Environment:
Last Closed:	2013-09-23 22:33:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vidya Sakar 2012-10-17 11:30:48 UTC

+++ This bug was initially created as a clone of Bug #865825 +++

The afr_sh_has_*_pending pending functions, which were clearly copied from one another, all have the problem that if they get an error from dict_get_ptr they return immediately.  This means they won't check the pending counts for other peers, which should trigger self-heal if they're non-zero.  Failing to trigger self-heal when we should makes this a high-priority issue.

Comment 1 Amar Tumballi 2012-10-19 04:24:31 UTC

*** Bug 867364 has been marked as a duplicate of this bug. ***

Comment 4 spandura 2013-08-19 10:07:16 UTC

Verified the fix on the build:
==============================
glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42

Test Case: http://review.gluster.org/#/c/4070/4/tests/bugs/bug-865825.t
========================================================================
1. Create a 1 x 3 replicate volume

2. Set the following volume options
"write-behind" "off"
"io-cache" "off"
"self-heal-daemon" "off"
"io-cache" "off"
"background-self-heal-count" "0"

3. Start the volume

4. Create fuse/nfs mount

5. Create a file from one of the mount point:
echo "Testing_bug_867360" > test_file

6. Unmount the mount

7. ## Mess with the flags as though brick-0 accuses brick-2 , brick-1 is missing its brick-2 changelog altogether. 

setfattr -n trusted.afr.<volume_name>-client-2 -v "0x000000010000000000000000" <brick0_abs_path>/test_file

setfattr -x trusted.afr.<volume_name>-client-2 <brick1_abs_path>/test_file

echo "wrong_data" > <brick2_abs_path>/test_file

8. Create fuse/nfs mount

9. From fuse/nfs mount execute: "stat test_file"

10. Check the contents of the file "test_file" from all the bricks. { expect : "Testing_bug_867360" }


The case is executed on both fuse and nfs mount. The bug is fixed. Moving the bug to assigned state.

Comment 5 spandura 2013-08-19 10:25:46 UTC

(In reply to spandura from comment #4)
> Verified the fix on the build:
> ==============================
> glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42
> 

> The case is executed on both fuse and nfs mount. The bug is fixed. Moving
> the bug to assigned state.

Moving the bug to verified state. { Typo Error : assigned -> verified}

Comment 6 Scott Haines 2013-09-23 22:33:33 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.