Hide Forgot
Description of problem: The afr_start_crawl() function is called before expiry of "cluster.heal-timeout" seconds (which is 600 seconds by default) Version-Release number of selected component (if applicable): RHS 2.1 How reproducible: Always Steps to Reproduce: 1.Create and start a 1x2 replica volume using 2 different nodes. 2.gluster v set <VOLNAME> diagnostics.client-log-level DEBUG 3.gluster v set <VOLNAME> cluster.heal-timeout 300 4.tailf /var/log/glusterfs/glustershd.log (on either of the nodes) [2013-11-07 10:32:06.058154] D [afr-self-heald.c:1233:afr_start_crawl] 0-testvol-replicate-0: starting crawl 1 for testvol-client-0 . . . [2013-11-07 10:32:07.059428] D [afr-self-heald.c:1233:afr_start_crawl] 0-testvol-replicate-0: starting crawl 1 for testvol-client-0 Actual results: The time interval between 2 successive invocations of afr_start_crawl() is just one second. Expected results: The crawler must start only once in "cluster.heal-timeout" seconds. Additional info:
Downstream review URL: https://code.engineering.redhat.com/gerrit/#/c/15473/
Verified the bug on the build "glusterfs 3.4.0.52rhs built on Dec 19 2013 12:20:16" . Bug is fixed. Moving the bug to Verified state. Cases Verified on 1 x 3 replicate volume: =========================================== 1. Set the heal-timeout to 120 . Observed the crawl happening every 2 minutes. 2. Set the heal-timeout to 60s. Observed the crawl happening every 1 minute. 3. Killed a brick process, Created lot of files and directories. Observed the crawl happening every 1 minute. Brought back the brick online and self-heal was also successful.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html