1084653 – tests/bugs/bug-865825.t needs to wait longer for self-heal daemon to start. It's failing in Rackspace due to this.

Bug 1084653 - tests/bugs/bug-865825.t needs to wait longer for self-heal daemon to start. It's failing in Rackspace due to this.

Summary: tests/bugs/bug-865825.t needs to wait longer for self-heal daemon to start. ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	tests
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Justin Clift
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-05 02:36 UTC by Justin Clift
Modified:	2015-07-13 04:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.6.0beta1
Clone Of:
Environment:
Last Closed:	2014-11-11 08:29:32 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Justin Clift 2014-04-05 02:36:19 UTC

Description of problem:

  tests/bugs/bug-865825.t is failing fairly consistently in Rackspace.
  This is due to the test only waiting 3 seconds for the self healing
  daemon to start, which isn't always long enough.

  With some echo statements added so we can see what's going on,
  this shows the failure case ("not ok 17"):

    # tests/bugs/bug-865825.t
    1..21
    TEST glusterd
    ok 1
    TEST pidof glusterd
    ok 2
    TEST gluster --mode=script --wignore volume info
    No volumes present
    ok 3
    TEST gluster --mode=script --wignore volume create patchy replica 3 jc0.cloud.gluster.org:/d/backends/patchy-0 jc0.cloud.gluster.org:/d/backends/patchy-1 jc0.cloud.gluster.org:/d/backends/patchy-2
    ok 4
    EXPECT patchy volinfo_field patchy Volume Name
    ok 5
    EXPECT Created volinfo_field patchy Status
    ok 6
    TEST gluster --mode=script --wignore volume set patchy cluster.background-self-heal-count 0
    ok 7
    TEST gluster --mode=script --wignore volume set patchy performance.io-cache off
    ok 8
    TEST gluster --mode=script --wignore volume set patchy performance.quick-read off
    ok 9
    TEST gluster --mode=script --wignore volume set patchy performance.write-behind off
    ok 10
    TEST gluster --mode=script --wignore volume set patchy performance.stat-prefetch off
    ok 11
    TEST gluster --mode=script --wignore volume set patchy cluster.self-heal-daemon off
    ok 12
    TEST gluster --mode=script --wignore volume start patchy
    ok 13
    EXPECT Started volinfo_field patchy Status
    ok 14
    TEST glusterfs --volfile-server=jc0.cloud.gluster.org --volfile-id=patchy /mnt/glusterfs/0
    ok 15
    TEST umount /mnt/glusterfs/0
    ok 16
    setfattr: /d/backends/patchy-1/a_file: No such attribute
    volume set: success
    Self-heal daemon is not running. Check self-heal daemon log file. 
    EXPECT_WITHIN 30 test_data cat /d/backends/patchy-2/a_file
    not ok 17
    TEST gluster --mode=script --wignore volume stop patchy
    ok 18
    EXPECT Stopped volinfo_field patchy Status
    ok 19
    TEST gluster --mode=script --wignore volume delete patchy
    ok 20
    TEST ! gluster --mode=script --wignore volume info patchy
    ok 21

  This shows the success case, with all tests passing:

    # tests/bugs/bug-865825.t
    1..21
    TEST glusterd
    ok 1
    TEST pidof glusterd
    ok 2
    TEST gluster --mode=script --wignore volume info
    No volumes present
    ok 3
    TEST gluster --mode=script --wignore volume create patchy replica 3 jc0.cloud.gluster.org:/d/backends/patchy-0 jc0.cloud.gluster.org:/d/backends/patchy-1 jc0.cloud.gluster.org:/d/backends/patchy-2
    ok 4
    EXPECT patchy volinfo_field patchy Volume Name
    ok 5
    EXPECT Created volinfo_field patchy Status
    ok 6
    TEST gluster --mode=script --wignore volume set patchy cluster.background-self-heal-count 0
    ok 7
    TEST gluster --mode=script --wignore volume set patchy performance.io-cache off
    ok 8
    TEST gluster --mode=script --wignore volume set patchy performance.quick-read off
    ok 9
    TEST gluster --mode=script --wignore volume set patchy performance.write-behind off
    ok 10
    TEST gluster --mode=script --wignore volume set patchy performance.stat-prefetch off
    ok 11
    TEST gluster --mode=script --wignore volume set patchy cluster.self-heal-daemon off
    ok 12
    TEST gluster --mode=script --wignore volume start patchy
    ok 13
    EXPECT Started volinfo_field patchy Status
    ok 14
    TEST glusterfs --volfile-server=jc0.cloud.gluster.org --volfile-id=patchy /mnt/glusterfs/0
    ok 15
    TEST umount /mnt/glusterfs/0
    ok 16
    setfattr: /d/backends/patchy-1/a_file: No such attribute
    volume set: success
    Launching heal operation to perform full self heal on volume patchy has been successful 
    Use heal info commands to check status
    EXPECT_WITHIN 30 test_data cat /d/backends/patchy-2/a_file
    ok 17
    TEST gluster --mode=script --wignore volume stop patchy
    ok 18
    EXPECT Stopped volinfo_field patchy Status
    ok 19
    TEST gluster --mode=script --wignore volume delete patchy
    ok 20
    TEST ! gluster --mode=script --wignore volume info patchy
    ok 21

  In the test source code, there is a "sleep 3" to wait for the
  self-heal daemon just before the error.  Increasing the delay
  to longer than 3 seconds seems to work, with the test no longer
  failing in Rackspace.

  I'll submit a patch for review in Gerrit shortly, with a change
  to 10 seconds (just to be safe).


Version-Release number of selected component (if applicable):

  Upstream git master, as of Sat 5th Apr 2014
  commit d8dd4049143c191cea451bade470b906c67dbbe0


How reproducible:

  Frequently failing, but not every time. :/

Comment 1 Anand Avati 2014-04-05 02:40:21 UTC

REVIEW: http://review.gluster.org/7404 (tests: Increase bug-865825.t wait time for self-heal daemon) posted (#1) for review on master by Justin Clift (justin)

Comment 2 Anand Avati 2014-04-08 05:52:18 UTC

COMMIT: http://review.gluster.org/7404 committed in master by Vijay Bellur (vbellur) 
------
commit aef305334c379f6875f0f9ded1e05526c8e36c81
Author: Justin Clift <justin>
Date:   Sat Apr 5 03:38:17 2014 +0100

    tests: Increase bug-865825.t wait time for self-heal daemon
    
    BUG: 1084653
    Change-Id: I057bbd2e50803344552314b32d2d0e6240bf9604
    Signed-off-by: Justin Clift <justin>
    Reviewed-on: http://review.gluster.org/7404
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 3 Niels de Vos 2014-09-22 12:37:36 UTC

A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-11-11 08:29:32 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Note You need to log in before you can comment on or make changes to this bug.