Description of problem: #!/bin/bash #Test that parallel heal-info command execution doesn't result in spurious #entries with locking-scheme granular . $(dirname $0)/../../include.rc . $(dirname $0)/../../volume.rc cleanup; function heal_info_to_file { while [ -f $M0/a.txt ]; do $CLI volume heal $V0 info | grep -i number | grep -v 0 >> $1 done } function write_and_del_file { dd of=$M0/a.txt if=/dev/zero bs=1024k count=100 rm -f $M0/a.txt } TEST glusterd TEST pidof glusterd TEST $CLI volume create $V0 replica 2 $H0:$B0/brick{0,1} TEST $CLI volume set $V0 locking-scheme granular TEST $CLI volume start $V0 TEST $GFS --volfile-id=$V0 --volfile-server=$H0 $M0; TEST touch $M0/a.txt write_and_del_file & touch $B0/f1 $B0/f2 heal_info_to_file $B0/f1 & heal_info_to_file $B0/f2 & wait EXPECT "^$" cat $B0/f1 EXPECT "^$" cat $B0/f2 cleanup; This test failed on NetBSD twice. While debugging it was found that if unlink is in progress when 'dirty' index is being checked for heal, on one of the bricks it gets ENOENT while on the other it will get success. This will lead to an assumption that the file needs heal. This was leading to failure. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi, So I ran this continuously in a loop for about 24 hrs on linux and it didn't fail even once. For want of a NetBSD slave, I went through the most recent ~120 netbsd runs on jenkins slaves: https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17730/ through https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17848/ and heal-info.t did not fail in any of these runs. So I am closing this bug for now with 'WORKSFORME' resolution. Please reopen the bug if the failure occurs again with a link to the "console" output and the patch against which the failure was seen. -Krutika