Bug 1329871 - tests/basic/afr/heal-info.t fails
Summary: tests/basic/afr/heal-info.t fails
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-24 01:39 UTC by Pranith Kumar K
Modified: 2016-07-05 11:31 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-07-05 11:31:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2016-04-24 01:39:10 UTC
Description of problem:
#!/bin/bash
#Test that parallel heal-info command execution doesn't result in spurious
#entries with locking-scheme granular

. $(dirname $0)/../../include.rc
. $(dirname $0)/../../volume.rc

cleanup;

function heal_info_to_file {
        while [ -f $M0/a.txt ]; do
                $CLI volume heal $V0 info | grep -i number | grep -v 0 >> $1
        done
}

function write_and_del_file {
        dd of=$M0/a.txt if=/dev/zero bs=1024k count=100
        rm -f $M0/a.txt
}

TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 replica 2 $H0:$B0/brick{0,1}
TEST $CLI volume set $V0 locking-scheme granular
TEST $CLI volume start $V0
TEST $GFS --volfile-id=$V0 --volfile-server=$H0 $M0;
TEST touch $M0/a.txt
write_and_del_file &
touch $B0/f1 $B0/f2
heal_info_to_file $B0/f1 &
heal_info_to_file $B0/f2 &
wait
EXPECT "^$" cat $B0/f1
EXPECT "^$" cat $B0/f2

cleanup;

This test failed on NetBSD twice. While debugging it was found that if unlink is in progress when 'dirty' index is being checked for heal, on one of the bricks it gets ENOENT while on the other it will get success. This will lead to an assumption that the file needs heal.

This was leading to failure.
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Krutika Dhananjay 2016-07-05 11:31:21 UTC
Hi,

So I ran this continuously in a loop for about 24 hrs on linux and it didn't fail even once.

For want of a NetBSD slave, I went through the most recent ~120 netbsd runs on jenkins slaves:
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17730/ through https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17848/ 

and heal-info.t did not fail in any of these runs.

So I am closing this bug for now with 'WORKSFORME' resolution. Please reopen the bug if the failure occurs again with a link to the "console" output and the patch against which the failure was seen.

-Krutika


Note You need to log in before you can comment on or make changes to this bug.