Bug 1028663

Summary: AFR: self-heal daemon crawler not obeying the 'cluster.heal-timeout' time interval
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: coreAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.5.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1027559 Environment:
Last Closed: 2014-07-11 19:17:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1027559    
Bug Blocks:    

Description Ravishankar N 2013-11-09 08:15:50 UTC
+++ This bug was initially created as a clone of Bug #1027559 +++

Description of problem:
The afr_start_crawl() function is called before expiry of "cluster.heal-timeout" seconds (which is 600 seconds by default)

Version-Release number of selected component (if applicable):
RHS 2.1

How reproducible:
Always

Steps to Reproduce:
1.Create and start a 1x2 replica volume using 2 different nodes.
2.gluster v set <VOLNAME> diagnostics.client-log-level DEBUG
3.gluster v set <VOLNAME> cluster.heal-timeout 300
4.tailf /var/log/glusterfs/glustershd.log (on either of the nodes)

[2013-11-07 10:32:06.058154] D [afr-self-heald.c:1233:afr_start_crawl] 0-testvol-replicate-0: starting crawl 1 for testvol-client-0
.
.
.
[2013-11-07 10:32:07.059428] D [afr-self-heald.c:1233:afr_start_crawl] 0-testvol-replicate-0: starting crawl 1 for testvol-client-0

Actual results:

The time interval between 2 successive invocations of afr_start_crawl() is just one second.

Expected results:
The crawler must start only once in "cluster.heal-timeout" seconds.

Additional info:

--- Additional comment from RHEL Product and Program Management on 2013-11-07 00:24:39 EST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 1 Anand Avati 2013-11-09 08:27:48 UTC
REVIEW: http://review.gluster.org/6243 (libglusterfs: fix bug in timespec adjustment) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 2 Anand Avati 2013-11-11 03:42:05 UTC
COMMIT: http://review.gluster.org/6243 committed in master by Anand Avati (avati) 
------
commit d5335f9e40f6e9533f7812d153b9727bcc04aa4e
Author: Ravishankar N <ravishankar>
Date:   Sat Nov 9 11:56:34 2013 +0000

    libglusterfs: fix bug in timespec adjustment
    
    The argument to the timespec_adjust_delta() function introudced in
    commit 6836118b21 needs to be passed by reference rather than by value
    for the function to do it's job.
    
    BUG: 1028663
    Change-Id: I62a3636906e67ed35b7786e9553f6819b48f3626
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/6243
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>