Bug 1149118 - Spurious failure on disperse tests (bad file size on brick)
Summary: Spurious failure on disperse tests (bad file size on brick)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-03 09:14 UTC by Xavi Hernandez
Modified: 2014-11-10 15:14 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.6.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1144108
Environment:
Last Closed: 2014-11-10 15:14:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Xavi Hernandez 2014-10-03 09:14:51 UTC
+++ This bug was initially created as a clone of Bug #1144108 +++

Description of problem:

Sometimes, specially on NetBSD, ec test scripts fail because the size of a file on one of the bricks has an incorrect size.

Version-Release number of selected component (if applicable): master


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

This is caused by a side effect of the read of a file though the mount point when a brick is down. This may generate an update of the access time, leaving the not running brick in an invalid state. This is correctly healed by self-heal, but the script was not giving enough time to self-heal to complete.

--- Additional comment from Anand Avati on 2014-09-18 18:47:42 CEST ---

REVIEW: http://review.gluster.org/8771 (test/ec: Let self-heal repair files before accessing bricks) posted (#1) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from Anand Avati on 2014-09-30 18:46:39 CEST ---

REVIEW: http://review.gluster.org/8892 (test/ec: Fix spurious failures caused by self-heal) posted (#1) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from Anand Avati on 2014-10-01 11:27:44 CEST ---

REVIEW: http://review.gluster.org/8892 (test/ec: Fix spurious failures caused by self-heal) posted (#2) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from Anand Avati on 2014-10-03 11:01:30 CEST ---

COMMIT: http://review.gluster.org/8892 committed in master by Vijay Bellur (vbellur) 
------
commit a97ad9b69bb17f2351c59512fa9c6cb25d82b4da
Author: Xavier Hernandez <xhernandez>
Date:   Thu Sep 18 18:42:34 2014 +0200

    test/ec: Fix spurious failures caused by self-heal
    
    The sha1sum of a file may update the access time of that file.
    If this happens while a brick is down, as it is forced in the
    test, that brick doesn't get the update, getting out of sync.
    
    When the brick is restarted, self-heal repairs the file, but
    the test shouldn't access brick contents until self-heal finishes.
    If this is combined with a kill of another brick before self-heal
    has finished repairing the file, the volume could become inaccessible.
    
    Since the purpose of these tests is only to check ec functionality
    (there is another test that checks self-heal), the test that corrupts
    the file has been removed.
    
    Additional checks to validate the state of the volume have been added
    to avoid some timing issues.
    
    BUG: 1144108
    Change-Id: Ibd9288de519914663998a1fbc4321ec92ed6082c
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/8892
    Reviewed-by: Emmanuel Dreyfus <manu>
    Tested-by: Emmanuel Dreyfus <manu>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 1 Anand Avati 2014-10-03 09:20:36 UTC
REVIEW: http://review.gluster.org/8900 (test/ec: Fix spurious failures caused by self-heal) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 2 Anand Avati 2014-10-03 12:30:01 UTC
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 3 Anand Avati 2014-10-03 16:58:10 UTC
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#2) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 4 Anand Avati 2014-10-06 07:13:02 UTC
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#3) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 5 Anand Avati 2014-10-21 07:46:03 UTC
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#4) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 6 Anand Avati 2014-10-21 18:39:18 UTC
COMMIT: http://review.gluster.org/8902 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit d01b00ae2b124dfdd6905e463533a715f1cedc5b
Author: Xavier Hernandez <xhernandez>
Date:   Thu Sep 18 18:42:34 2014 +0200

    test/ec: Fix spurious failures caused by self-heal
    
    The sha1sum of a file may update the access time of that file.
    If this happens while a brick is down, as it is forced in the
    test, that brick doesn't get the update, getting out of sync.
    
    When the brick is restarted, self-heal repairs the file, but
    the test shouldn't access brick contents until self-heal finishes.
    If this is combined with a kill of another brick before self-heal
    has finished repairing the file, the volume could become inaccessible.
    
    Since the purpose of these tests is only to check ec functionality
    (there is another test that checks self-heal), the test that corrupts
    the file has been removed.
    
    Additional checks to validate the state of the volume have been added
    to avoid some timing issues.
    
    This is a backport of http://review.gluster.org/8892/
    
    BUG: 1149118
    Change-Id: I8a40b7f07fc8ecd2c721bad1bcdd351dd8504155
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/8902
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>

Comment 7 Niels de Vos 2014-11-10 15:14:05 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.