+++ This bug was initially created as a clone of Bug #1144108 +++ Description of problem: Sometimes, specially on NetBSD, ec test scripts fail because the size of a file on one of the bricks has an incorrect size. Version-Release number of selected component (if applicable): master How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: This is caused by a side effect of the read of a file though the mount point when a brick is down. This may generate an update of the access time, leaving the not running brick in an invalid state. This is correctly healed by self-heal, but the script was not giving enough time to self-heal to complete. --- Additional comment from Anand Avati on 2014-09-18 18:47:42 CEST --- REVIEW: http://review.gluster.org/8771 (test/ec: Let self-heal repair files before accessing bricks) posted (#1) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-09-30 18:46:39 CEST --- REVIEW: http://review.gluster.org/8892 (test/ec: Fix spurious failures caused by self-heal) posted (#1) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-10-01 11:27:44 CEST --- REVIEW: http://review.gluster.org/8892 (test/ec: Fix spurious failures caused by self-heal) posted (#2) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-10-03 11:01:30 CEST --- COMMIT: http://review.gluster.org/8892 committed in master by Vijay Bellur (vbellur) ------ commit a97ad9b69bb17f2351c59512fa9c6cb25d82b4da Author: Xavier Hernandez <xhernandez> Date: Thu Sep 18 18:42:34 2014 +0200 test/ec: Fix spurious failures caused by self-heal The sha1sum of a file may update the access time of that file. If this happens while a brick is down, as it is forced in the test, that brick doesn't get the update, getting out of sync. When the brick is restarted, self-heal repairs the file, but the test shouldn't access brick contents until self-heal finishes. If this is combined with a kill of another brick before self-heal has finished repairing the file, the volume could become inaccessible. Since the purpose of these tests is only to check ec functionality (there is another test that checks self-heal), the test that corrupts the file has been removed. Additional checks to validate the state of the volume have been added to avoid some timing issues. BUG: 1144108 Change-Id: Ibd9288de519914663998a1fbc4321ec92ed6082c Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/8892 Reviewed-by: Emmanuel Dreyfus <manu> Tested-by: Emmanuel Dreyfus <manu> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8900 (test/ec: Fix spurious failures caused by self-heal) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#2) for review on release-3.6 by Xavier Hernandez (xhernandez)
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#3) for review on release-3.6 by Xavier Hernandez (xhernandez)
REVIEW: http://review.gluster.org/8902 (test/ec: Fix spurious failures caused by self-heal) posted (#4) for review on release-3.6 by Xavier Hernandez (xhernandez)
COMMIT: http://review.gluster.org/8902 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit d01b00ae2b124dfdd6905e463533a715f1cedc5b Author: Xavier Hernandez <xhernandez> Date: Thu Sep 18 18:42:34 2014 +0200 test/ec: Fix spurious failures caused by self-heal The sha1sum of a file may update the access time of that file. If this happens while a brick is down, as it is forced in the test, that brick doesn't get the update, getting out of sync. When the brick is restarted, self-heal repairs the file, but the test shouldn't access brick contents until self-heal finishes. If this is combined with a kill of another brick before self-heal has finished repairing the file, the volume could become inaccessible. Since the purpose of these tests is only to check ec functionality (there is another test that checks self-heal), the test that corrupts the file has been removed. Additional checks to validate the state of the volume have been added to avoid some timing issues. This is a backport of http://review.gluster.org/8892/ BUG: 1149118 Change-Id: I8a40b7f07fc8ecd2c721bad1bcdd351dd8504155 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/8902 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users