Bug 765518 (GLUSTER-3786) - posix compliance failiure over nfs
Summary: posix compliance failiure over nfs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3786
Product: GlusterFS
Classification: Community
Component: nfs
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Krishna Srinivas
QA Contact:
URL:
Whiteboard:
: 770039 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-07 12:37 UTC by Saurabh
Modified: 2012-01-09 12:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Saurabh 2011-11-07 09:59:25 UTC
test result for a volume having ext4 as backend,

Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 5)
  Failed tests:  77, 141, 145, 149, 153
/opt/qa/tools/posix-testsuite/tests/link/00.t    (Wstat: 0 Tests: 82 Failed: 3)
  Failed tests:  56, 72, 79
/opt/qa/tools/posix-testsuite/tests/open/00.t    (Wstat: 0 Tests: 47 Failed: 2)
  Failed tests:  33-34
/opt/qa/tools/posix-testsuite/tests/rmdir/08.t   (Wstat: 0 Tests: 10 Failed: 3)
  Failed tests:  8-10
/opt/qa/tools/posix-testsuite/tests/unlink/00.t  (Wstat: 0 Tests: 55 Failed: 4)
  Failed tests:  17, 41-42, 53
Files=185, Tests=1979, 74 wallclock secs ( 0.60 usr  0.22 sys +  5.64 cusr  7.11 csys = 13.57 CPU)
Result: FAIL


Machines in consideration are 10.1.11.130/10.1.11.131 as servers and 10.1.11.100 as client.

here with ext4 the above failure are seen usually but with xfs I am finding symlink is also failing.

Comment 1 Saurabh 2011-11-07 10:04:33 UTC
nfs logs,


[2011-11-07 07:59:05.775510] W [dht-rename.c:348:dht_rename_unlink_cbk] 0-dist-rep7-dht: /fstest_87e2755ec7dc9a183e63b3e1c189f268/fstest_1a8fc56e5f0951e56d9439e6e7e57893: unlink on dist-rep7-client-2 failed (No such file or directory)
[2011-11-07 07:59:05.775728] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-dist-rep7-client-3: remote operation failed: No such file or directory
[2011-11-07 07:59:05.775746] W [dht-rename.c:348:dht_rename_unlink_cbk] 0-dist-rep7-dht: /fstest_87e2755ec7dc9a183e63b3e1c189f268/fstest_1a8fc56e5f0951e56d9439e6e7e57893: unlink on dist-rep7-client-3 failed (No such file or directory)
[2011-11-07 07:59:07.701681] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-dist-rep7-client-0: remote operation failed: Permission denied
[2011-11-07 07:59:07.701751] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-dist-rep7-client-2: remote operation failed: Permission denied
[2011-11-07 07:59:07.701817] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-dist-rep7-client-1: remote operation failed: Permission denied
[2011-11-07 07:59:07.701840] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-dist-rep7-client-3: remote operation failed: Permission denied
[2011-11-07 07:59:07.702043] E [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-dist-rep7-client-0: remote operation failed: File exists
[2011-11-07 07:59:07.702112] E [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-dist-rep7-client-2: remote operation failed: File exists
[2011-11-07 07:59:07.702136] E [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-dist-rep7-client-1: remote operation failed: File exists
[2011-11-07 07:59:07.702171] E [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-dist-rep7-client-3: remote operation failed: File exists
[2011-11-07 07:59:07.711342] E [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-dist-rep7-client-0: remote operation failed: Permission denied
[2011-11-07 07:59:07.713577] E [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-dist-rep7-client-1: remote operation failed: Permission denied
[2011-11-07 07:59:07.713608] E [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-dist-rep7-client-2: remote operation failed: Permission denied
[2011-11-07 07:59:07.714387] E [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-dist-rep7-client-3: remote operation failed: Permission denied
[2011-11-07 07:59:09.822619] E [client3_1-fops.c:173:client3_1_symlink_cbk] 0-dist-rep7-client-0: remote operation failed: Permission denied
(END)

Comment 2 Saurabh 2011-11-07 12:37:19 UTC
the failures are intermittent,

with a volume having xfs as backend,

here are the two results,
a)
Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 5)
  Failed tests:  77, 141, 145, 149, 153
/opt/qa/tools/posix-testsuite/tests/mkfifo/00.t  (Wstat: 0 Tests: 36 Failed: 5)
  Failed tests:  30-34
/opt/qa/tools/posix-testsuite/tests/rmdir/08.t   (Wstat: 0 Tests: 10 Failed: 3)
  Failed tests:  8-10
/opt/qa/tools/posix-testsuite/tests/symlink/00.t (Wstat: 0 Tests: 14 Failed: 2)
  Failed tests:  11-12
Files=185, Tests=1979, 73 wallclock secs ( 0.58 usr  0.23 sys +  5.75 cusr  7.40 csys = 13.96 CPU)
Result: FAIL

b)
Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 5)
  Failed tests:  77, 141, 145, 149, 153
/opt/qa/tools/posix-testsuite/tests/rmdir/08.t   (Wstat: 0 Tests: 10 Failed: 3)
  Failed tests:  8-10
Files=185, Tests=1979, 73 wallclock secs ( 0.61 usr  0.22 sys +  5.76 cusr  7.31 csys = 13.90 CPU)
Result: FAIL

Comment 3 Krishna Srinivas 2011-11-08 09:19:00 UTC
The nfs logs below might be valid as it might have tried to do some op that is not permitted.

While testing what I observed was that certain tests were failing randomly at:
misc.sh:
test_check()
{
        if [ $* ]; then
                echo "ok ${ntest}"
        else
                echo "not ok ${ntest}"
        fi
        ntest=`expr $ntest + 1`
}

test_check was called like this:
"test_check $ctime1 -lt $ctime2" after an operation like unlink, so this means that sometimes ctime was being changed and sometimes not ... after a successful unlink.

Maybe the reason why ctime is not being updated is because the unlinked file is from a different subvolume than the one from which DHT returns the ctime of parent directory.

Why is change in ctime being used to measure success or failure? the exit status of unlink is good enough to check the result right?

Here is the snippet of the related code:
# successful unlink(2) updates ctime.
expect 0 create ${n0} 0644
expect 0 link ${n0} ${n1}
ctime1=`${fstest} stat ${n0} ctime`
sleep 1
expect 0 unlink ${n1}
ctime2=`${fstest} stat ${n0} ctime`
test_check $ctime1 -lt $ctime2
expect 0 unlink ${n0}



(In reply to comment #2)
> nfs logs,
> 
> 
> [2011-11-07 07:59:05.775510] W [dht-rename.c:348:dht_rename_unlink_cbk]
> 0-dist-rep7-dht:
> /fstest_87e2755ec7dc9a183e63b3e1c189f268/fstest_1a8fc56e5f0951e56d9439e6e7e57893:
> unlink on dist-rep7-client-2 failed (No such file or directory)
> [2011-11-07 07:59:05.775728] I [client3_1-fops.c:502:client3_1_unlink_cbk]
> 0-dist-rep7-client-3: remote operation failed: No such file or directory
> [2011-11-07 07:59:05.775746] W [dht-rename.c:348:dht_rename_unlink_cbk]
> 0-dist-rep7-dht:
> /fstest_87e2755ec7dc9a183e63b3e1c189f268/fstest_1a8fc56e5f0951e56d9439e6e7e57893:
> unlink on dist-rep7-client-3 failed (No such file or directory)
> [2011-11-07 07:59:07.701681] I [client3_1-fops.c:547:client3_1_rmdir_cbk]
> 0-dist-rep7-client-0: remote operation failed: Permission denied
> [2011-11-07 07:59:07.701751] I [client3_1-fops.c:547:client3_1_rmdir_cbk]
> 0-dist-rep7-client-2: remote operation failed: Permission denied
> [2011-11-07 07:59:07.701817] I [client3_1-fops.c:547:client3_1_rmdir_cbk]
> 0-dist-rep7-client-1: remote operation failed: Permission denied
> [2011-11-07 07:59:07.701840] I [client3_1-fops.c:547:client3_1_rmdir_cbk]
> 0-dist-rep7-client-3: remote operation failed: Permission denied
> [2011-11-07 07:59:07.702043] E [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-dist-rep7-client-0: remote operation failed: File exists
> [2011-11-07 07:59:07.702112] E [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-dist-rep7-client-2: remote operation failed: File exists
> [2011-11-07 07:59:07.702136] E [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-dist-rep7-client-1: remote operation failed: File exists
> [2011-11-07 07:59:07.702171] E [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-dist-rep7-client-3: remote operation failed: File exists
> [2011-11-07 07:59:07.711342] E [client3_1-fops.c:2132:client3_1_opendir_cbk]
> 0-dist-rep7-client-0: remote operation failed: Permission denied
> [2011-11-07 07:59:07.713577] E [client3_1-fops.c:2132:client3_1_opendir_cbk]
> 0-dist-rep7-client-1: remote operation failed: Permission denied
> [2011-11-07 07:59:07.713608] E [client3_1-fops.c:2132:client3_1_opendir_cbk]
> 0-dist-rep7-client-2: remote operation failed: Permission denied
> [2011-11-07 07:59:07.714387] E [client3_1-fops.c:2132:client3_1_opendir_cbk]
> 0-dist-rep7-client-3: remote operation failed: Permission denied
> [2011-11-07 07:59:09.822619] E [client3_1-fops.c:173:client3_1_symlink_cbk]
> 0-dist-rep7-client-0: remote operation failed: Permission denied
> (END)

Comment 4 Krishna Srinivas 2011-11-09 09:57:49 UTC
The symlink/rmdir script tests run fine now, I just sync'ed the time on the servers to fix it. (see my previous comment for the reason)


in src/tools/regression/fstest/tests/rmdir/08.t


expect EACCES -u 65534 -g 65534 rmdir ${n1}/${n2}

this line was causing the dir perm of ${n1}/${n2} to change to this:


[root@RHEL6 nfs-test]# ls -ld fstest_64a00258a3806a43f710828a369f6c1f/fstest_3c0e0f8e660218cf90e09017c9757dee/fstest_d35df3865c95e09365ec2db1c50d1efd/
d---------. 2 root root 24 Dec 31  1969 fstest_64a00258a3806a43f710828a369f6c1f/fstest_3c0e0f8e660218cf90e09017c9757dee/fstest_d35df3865c95e09365ec2db1c50d1efd/
[root@RHEL6 nfs-test]# 


Can you check if the script is working fine?

Comment 5 Saurabh 2011-11-10 04:25:25 UTC
Krishna, as per our discussion these are failures I was referring, these are found with quota enabled and posix tests getting executed over nfs mount.

/opt/qa/tools/posix-testsuite/tests/open/22.t ...... ok     
/opt/qa/tools/posix-testsuite/tests/open/23.t ...... ok   
/opt/qa/tools/posix-testsuite/tests/rename/00.t .... 68/79 stat returned -1
/opt/qa/tools/posix-testsuite/tests/rename/../misc.sh: line 101: [: ENOENT: integer expression expected
/opt/qa/tools/posix-testsuite/tests/rename/00.t .... Failed 6/79 subtests 
/opt/qa/tools/posix-testsuite/tests/rename/01.t .... ok   
/opt/qa/tools/posix-testsuite/tests/rename/02.t .... ok     
/opt/qa/tools/posix-testsuite/tests/rename/03.t .... ok   

Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 3)
  Failed tests:  77, 84, 88
/opt/qa/tools/posix-testsuite/tests/link/00.t    (Wstat: 0 Tests: 82 Failed: 2)
  Failed tests:  72, 79
/opt/qa/tools/posix-testsuite/tests/rename/00.t  (Wstat: 0 Tests: 79 Failed: 6)
  Failed tests:  68-70, 72, 76, 79
/opt/qa/tools/posix-testsuite/tests/rename/05.t  (Wstat: 0 Tests: 17 Failed: 4)
  Failed tests:  12, 14-15, 17
/opt/qa/tools/posix-testsuite/tests/rename/ren.t (Wstat: 0 Tests: 22 Failed: 11)
  Failed tests:  7-9, 11, 15-17, 19-22
/opt/qa/tools/posix-testsuite/tests/rmdir/08.t   (Wstat: 0 Tests: 10 Failed: 3)
  Failed tests:  8-10
Files=185, Tests=1979, 78 wallclock secs ( 0.60 usr  0.24 sys +  5.87 cusr  7.28 csys = 13.99 CPU)
Result: FAIL

Comment 6 Anand Avati 2011-11-11 06:57:09 UTC
CHANGE: http://review.gluster.com/715 (A EACCES error also should not trigger a selfheal.) merged in release-3.2 by Vijay Bellur (vijay)

Comment 7 Saurabh 2011-11-11 09:20:59 UTC
Presently after applying Junaid's patch related to bug765540, i.e. resolving rename issue for posix-compliance, the result is that link/00.t test is also failing now

Test Summary Report
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 3)
  Failed tests:  77, 84, 88
/opt/qa/tools/posix-testsuite/tests/link/00.t    (Wstat: 0 Tests: 82 Failed: 2)
  Failed tests:  72, 79
/opt/qa/tools/posix-testsuite/tests/rmdir/08.t   (Wstat: 0 Tests: 10 Failed: 3)
  Failed tests:  8-10
Files=185, Tests=1979, 76 wallclock secs ( 0.61 usr  0.27 sys +  6.02 cusr  8.50 csys = 15.40 CPU)
Result: FAIL

Comment 8 Krishna Srinivas 2011-11-15 07:28:21 UTC
(In reply to comment #7)
> Presently after applying Junaid's patch related to bug765540, i.e. resolving
> rename issue for posix-compliance, the result is that link/00.t test is also
> failing now
> 
> Test Summary Report
> -------------------
> /opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed:
> 3)

This happens on fuse also.

>   Failed tests:  77, 84, 88
> /opt/qa/tools/posix-testsuite/tests/link/00.t    (Wstat: 0 Tests: 82 Failed: 2)
>   Failed tests:  72, 79

Known issue, reason explained in the mail pasted below.

> /opt/qa/tools/posix-testsuite/tests/rmdir/08.t   (Wstat: 0 Tests: 10 Failed: 3)

Fixed by http://review.gluster.com/#change,715 (Shishir's fix)

Closing the bug.

>   Failed tests:  8-10
> Files=185, Tests=1979, 76 wallclock secs ( 0.61 usr  0.27 sys +  6.02 cusr 
> 8.50 csys = 15.40 CPU)
> Result: FAIL

Hi Saurabh,

The link failure happens only when replicate is in the configuration. Replicate maintains data about a transaction in extended attributes of the inode. Any extended attribute update results in a ctime change. In this case, the sequence of tests are as under:

1) Record ctime1
2) Have a test case fail due to EACCESS
3) Record ctime2
4) Check if ctime1 and ctime2 are the same

As per POSIX, ctime1 and ctime2 have to be the same. However, because of 2) above, we end up updating/cleaning up extended attributes stored by replicate. Hence ctime1 would not be equal to ctime2, this results in the test-suite reporting a failure.

This will have to be classified as a known failure with NFS. With FUSE, EACCESS is handled by VFS and hence glusterfs doesn't get to see the link call. This results in no extended attribute updates and hence the tests would pass on a FUSE mount.

Thanks for providing your setup. You can take it back for your testing.

Regards,
Vijay

Comment 9 Anand Avati 2011-11-16 07:08:54 UTC
CHANGE: http://review.gluster.com/716 (A EACCES error also should not trigger a selfheal.) merged in master by Vijay Bellur (vijay)

Comment 10 Gaurav 2012-01-09 12:37:31 UTC
*** Bug 770039 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.