Bug 806761

Summary: NFS: unable to delete directories after disk failure
Product: [Community] GlusterFS Reporter: Sachidananda Urs <sac>
Component: distributeAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact: Sachidananda Urs <sac>
Severity: high Docs Contact:
Priority: high    
Version: pre-releaseCC: gluster-bugs, nsathyan, rfortier
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:29:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
Contains fuse log, nfs log, brick log bzipped none

Description Sachidananda Urs 2012-03-26 07:47:23 UTC
Description of problem:

When one of the nodes becomes read-only and back to read-write some of the directories become inaccessible. A `rm -rf' on them does not remove the directory nor any error is shown on the mount point. Log throws following warnings:


[2012-03-25 21:02:54.151641] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 57a38c4d: /foomati/linux-3.2.11 => -1 (No such file or directory)
[2012-03-25 21:03:48.894359] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:03:48.896717] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:03:48.897653] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:03:52.239541] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:23.150555] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:24.178276] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:24.179653] W [client3_1-fops.c:1097:client3_1_access_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:24.180271] W [client3_1-fops.c:879:client3_1_getxattr_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: (null)
[2012-03-25 21:12:24.180521] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-0: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180701] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-1: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180744] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-3: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180773] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-2: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:24.180787] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-2-dht: found anomalies in /linux-1. holes=1 overlaps=0
[2012-03-25 21:12:24.180873] W [nfs3.c:1492:nfs3svc_access_cbk] 0-nfs: 8964974d: /linux-1 => -1 (Structure needs cleaning)
[2012-03-25 21:12:24.180905] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 8964974d, ACCESS: NFS: 10006(Error occurred on the server or IO Error), POSIX
: 117(Structure needs cleaning)
[2012-03-25 21:12:24.181426] W [client3_1-fops.c:2081:client3_1_opendir_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: /linux-1
[2012-03-25 21:12:24.182678] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 8d64974d: /linux-1 => -1 (No such file or directory)
[2012-03-25 21:12:32.189148] W [client3_1-fops.c:423:client3_1_stat_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:32.190011] W [client3_1-fops.c:1097:client3_1_access_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory
[2012-03-25 21:12:32.190687] W [client3_1-fops.c:879:client3_1_getxattr_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: (null)
[2012-03-25 21:12:32.191005] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-0: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191125] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-1: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191215] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-3: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191243] W [client3_1-fops.c:2157:client3_1_lookup_cbk] 0-nfs-test-2-client-2: remote operation failed: Invalid argument. Path: /linux-1
[2012-03-25 21:12:32.191255] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-2-dht: found anomalies in /linux-1. holes=1 overlaps=0
[2012-03-25 21:12:32.191321] W [nfs3.c:1492:nfs3svc_access_cbk] 0-nfs: c88b974d: /linux-1 => -1 (Structure needs cleaning)
[2012-03-25 21:12:32.191354] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: c88b974d, ACCESS: NFS: 10006(Error occurred on the server or IO Error), POSIX: 117(Structure needs cleaning)
[2012-03-25 21:12:32.191863] W [client3_1-fops.c:2081:client3_1_opendir_cbk] 0-nfs-test-2-client-0: remote operation failed: No such file or directory. Path: /linux-1
[2012-03-25 21:12:32.193125] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: cc8b974d: /linux-1 => -1 (No such file or directory)
[2012-03-25 21:13:31.605355] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-2-dht: found anomalies in <gfid:e82d0e2c-e8d6-4596-97b7-5d2669a79f2e>. holes=1 overlaps=0
[2012-03-25 21:13:31.605904] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:e82d0e2c-e8d6-4596-97b7-5d2669a79f2e>: Invalid argument



Steps to Reproduce:
1. Create a volume and do a couple or more nfs mounts and keep doing some I/O on the mount. For example, kernel extraction, fsx tests, etc.
2. remount the backend FS read-only. The mount starts throwing I/O errors (Note not read-only FS), let the extraction continue. Now try to extract on another directory, and just ignore the errors for a while.
3. remount the backend FS read-write and try the above operations, it still fails. And try to rm -rf the directories, the above behavior is seen.
  

Additional info:
While the FS is read-only do a fuse mount and do the extraction of the same tar file, no errors are thrown but extraction seem to happen but no files/directories are created.

================ FUSE BEHAVIOR ======================
root@gqac009 fuse-0]# ls -l -a foomati
total 156
drwxr-xr-x.  5 root root    129 Mar 25  2012 .
drwxr-xr-x. 21 root root 159744 Mar 25  2012 ..
[root@gqac009 fuse-0]# rm -rf foomati
rm: cannot remove `foomati': Directory not empty
[root@gqac009 fuse-0]#

================ FUSE BEHAVIOR ======================

re-mounting the nfs/FUSE client does not solve the issue.

Comment 1 Sachidananda Urs 2012-03-26 07:51:20 UTC
Created attachment 572688 [details]
Contains fuse log, nfs log, brick log bzipped

Comment 2 Krishna Srinivas 2012-03-28 12:35:52 UTC
[2012-03-25 21:12:32.191255] I [dht-layout.c:600:dht_layout_normalize]
0-nfs-test-2-dht: found anomalies in /linux-1. holes=1 overlaps=0

Looks like distribute is seeing holes.

Comment 3 Krishna Srinivas 2012-03-29 05:44:01 UTC
Saw the logs, NFS is getting errors from DHT that there are holes. Reassigning the bug to DHT.

Comment 4 Anand Avati 2012-05-15 00:25:29 UTC
CHANGE: http://review.gluster.com/3327 (cluster/dht: Handle ENOENT failure in dht_rmdir_opendir_cbk) merged in master by Anand Avati (avati)