Bug 806761 - NFS: unable to delete directories after disk failure
Summary: NFS: unable to delete directories after disk failure
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: pre-release
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: shishir gowda
QA Contact: Sachidananda Urs
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-26 07:47 UTC by Sachidananda Urs
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:29:15 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Contains fuse log, nfs log, brick log bzipped (8.75 MB, application/x-tar)
2012-03-26 07:51 UTC, Sachidananda Urs
no flags Details

Description Sachidananda Urs 2012-03-26 07:47:23 UTC
Description of problem:

When one of the nodes becomes read-only and back to read-write some of the directories become inaccessible. A `rm -rf' on them does not remove the directory nor any error is shown on the mount point. Log throws following warnings:


[2012-03-25 21:02:54.151641] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 57a38c4d: /foomati/linux-3.2.11 => -1 (No such file or directory)
[2012-03-25 21:03:48.894359] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:03:48.896717] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:03:48.897653] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:03:52.239541] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:23.150555] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:24.178276] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:24.179653] W [client3_1-fops.c:1097:client3_1_access_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:24.180271] W [client3_1-fops.c:879:client3_1_getxattr_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: (null)
[2012-03-25 21:12:24.180521] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-0: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180701] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-1: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180744] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-3: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180773] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-2: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180787] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-2-dht: found anomalies in /linux-1. holes=1 overlaps=0
[2012-03-25 21:12:24.180873] W [nfs3.c:1492:nfs3svc_access_cbk] 0-nfs: 8964974d: /linux-1 => -1 (Structure needs cleaning)
[2012-03-25 21:12:24.180905] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 8964974d, ACCESS: NFS: 10006(Error occurred on the server or IO Error), POSIX
: 117(Structure needs cleaning)
[2012-03-25 21:12:24.181426] W [client3_1-fops.c:2081:client3_1_opendir_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: /linux-1
[2012-03-25 21:12:24.182678] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 8d64974d: /linux-1 => -1 (No such file or directory)
[2012-03-25 21:12:32.189148] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:32.190011] W [client3_1-fops.c:1097:client3_1_access_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:32.190687] W [client3_1-fops.c:879:client3_1_getxattr_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: (null)
[2012-03-25 21:12:32.191005] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-0: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191125] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-1: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191215] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-3: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191243] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-2: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191255] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-2-dht: found anomalies in /linux-1. holes=1 overlaps=0
[2012-03-25 21:12:32.191321] W [nfs3.c:1492:nfs3svc_access_cbk] 0-nfs: c88b974d: /linux-1 => -1 (Structure needs cleaning)
[2012-03-25 21:12:32.191354] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: c88b974d, ACCESS: NFS: 10006(Error occurred on the server or IO Error), POSIX: 117(Structure needs cleaning)
[2012-03-25 21:12:32.191863] W [client3_1-fops.c:2081:client3_1_opendir_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: /linux-1
[2012-03-25 21:12:32.193125] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: cc8b974d: /linux-1 => -1 (No such file or directory)
[2012-03-25 21:13:31.605355] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-2-dht: found anomalies in <gfid:e82d0e2c-e8d6-4596-97b7-5d2669a79f2e>. holes=1 overlaps=0
[2012-03-25 21:13:31.605904] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:e82d0e2c-e8d6-4596-97b7-5d2669a79f2e>: Invalid argument



Steps to Reproduce:
1. Create a volume and do a couple or more nfs mounts and keep doing some I/O on the mount. For example, kernel extraction, fsx tests, etc.
2. remount the backend FS read-only. The mount starts throwing I/O errors (Note not read-only FS), let the extraction continue. Now try to extract on another directory, and just ignore the errors for a while.
3. remount the backend FS read-write and try the above operations, it still fails. And try to rm -rf the directories, the above behavior is seen.
  

Additional info:
While the FS is read-only do a fuse mount and do the extraction of the same tar file, no errors are thrown but extraction seem to happen but no files/directories are created.

================ FUSE BEHAVIOR ======================
root@gqac009 fuse-0]# ls -l -a foomati
total 156
drwxr-xr-x.  5 root root    129 Mar 25  2012 .
drwxr-xr-x. 21 root root 159744 Mar 25  2012 ..
[root@gqac009 fuse-0]# rm -rf foomati
rm: cannot remove `foomati': Directory not empty
[root@gqac009 fuse-0]#

================ FUSE BEHAVIOR ======================

re-mounting the nfs/FUSE client does not solve the issue.

Comment 1 Sachidananda Urs 2012-03-26 07:51:20 UTC
Created attachment 572688 [details]
Contains fuse log, nfs log, brick log bzipped

Comment 2 Krishna Srinivas 2012-03-28 12:35:52 UTC
[2012-03-25 21:12:32.191255] I [dht-layout.c:600:dht_layout_normalize]
0-nfs-test-2-dht: found anomalies in /linux-1. holes=1 overlaps=0

Looks like distribute is seeing holes.

Comment 3 Krishna Srinivas 2012-03-29 05:44:01 UTC
Saw the logs, NFS is getting errors from DHT that there are holes. Reassigning the bug to DHT.

Comment 4 Anand Avati 2012-05-15 00:25:29 UTC
CHANGE: http://review.gluster.com/3327 (cluster/dht: Handle ENOENT failure in dht_rmdir_opendir_cbk) merged in master by Anand Avati (avati)


Note You need to log in before you can comment on or make changes to this bug.