Bug 834543 - DHT: Directory removal failed
DHT: Directory removal failed
Status: CLOSED NOTABUG
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: shishir gowda
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-22 06:43 EDT by Sachidananda Urs
Modified: 2013-12-08 20:32 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-11 01:00:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
SOS report (54.14 KB, application/x-xz)
2012-06-22 06:47 EDT, Sachidananda Urs
no flags Details

  None (edit)
Description Sachidananda Urs 2012-06-22 06:43:55 EDT
Description of problem:

When huge number of directories are rm -rf'ed, the command errors out with ENOENT.

Brick log snippet:

===============
[2012-06-22 09:34:12.588948] I [server3_1-fops.c:907:server_setxattr_cbk] 0-scalability-1-server: 47933778: SETXATTR (null) (--) ==> trusted.glusterfs.dht (No such file or directory)
[2012-06-22 09:42:23.265813] E [posix.c:223:posix_stat] 0-scalability-1-posix: lstat on /home/scalability-2/dir-2/.glusterfs/91/0d/910d3e55-6d23-472f-b95e-bcfc82d8b73d failed: No such file or directory
[2012-06-22 09:42:23.265848] I [server3_1-fops.c:1707:server_stat_cbk] 0-scalability-1-server: 48331226: STAT <gfid:910d3e55-6d23-472f-b95e-bcfc82d8b73d> (910d3e55-6d23-472f-b95e-bcfc82d8b73d) ==> -1 (No such file or directory)
[2012-06-22 09:42:25.814130] E [posix.c:223:posix_stat] 0-scalability-1-posix: lstat on /home/scalability-2/dir-2/.glusterfs/10/2e/102e40ff-4498-44ae-a5cb-8003bff283c8 failed: No such file or directory
[2012-06-22 09:42:25.814173] I [server3_1-fops.c:1707:server_stat_cbk] 0-scalability-1-server: 48334848: STAT <gfid:102e40ff-4498-44ae-a5cb-8003bff283c8> (102e40ff-4498-44ae-a5cb-8003bff283c8) ==> -1 (No such file or directory)
[2012-06-22 09:42:26.098540] E [posix.c:223:posix_stat] 0-scalability-1-posix: lstat on /home/scalability-2/dir-2/.glusterfs/f9/50/f9507285-9225-4170-a507-4eac03b5963c failed: No such file or directory
[2012-06-22 09:42:26.098567] I [server3_1-fops.c:1707:server_stat_cbk] 0-scalability-1-server: 48335279: STAT <gfid:f9507285-9225-4170-a507-4eac03b5963c> (f9507285-9225-4170-a507-4eac03b5963c) ==> -1 (No such file or directory)
[2012-06-22 09:42:26.648949] E [posix.c:223:posix_stat] 0-scalability-1-posix: lstat on /home/scalability-2/dir-2/.glusterfs/f0/a3/f0a3e720-44ba-469a-a6d3-f2994699d3bf failed: No such file or directory
[2012-06-22 09:42:26.648990] I [server3_1-fops.c:1707:server_stat_cbk] 0-scalability-1-server: 48335954: STAT <gfid:ede2772b-ba8c-4a46-90f9-d8265bee6851>/fileop_L1_83/fileop_L1_83_L2_46/fileop_dir_83_46_49 (f0a3e720-44ba-469a-a6d3-f2994699d3bf) ==> -1 (No such file or directory)
[2012-06-22 09:42:30.665662] E [posix.c:223:posix_stat] 0-scalability-1-posix: lstat on /home/scalability-2/dir-2/.glusterfs/5c/09/5c09d329-244d-4030-aa4f-02386b16963d failed: No such file or directory
[2012-06-22 09:42:30.665703] I [server3_1-fops.c:1707:server_stat_cbk] 0-scalability-1-server: 48340941: STAT <gfid:5c09d329-244d-4030-aa4f-02386b16963d> (5c09d329-244d-4030-aa4f-02386b16963d) ==> -1 (No such file or directory)
===========================

Client logs:
==============================
[2012-06-22 10:00:33.046009] E [nfs3-helpers.c:3603:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup fa
iled: <gfid:5d693efa-d7c3-4328-8d18-c0bd77b5401b>: Invalid argument
[2012-06-22 10:00:33.046041] E [nfs3.c:1513:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.16.15
7.39:901) scalability-1 : 5d693efa-d7c3-4328-8d18-c0bd77b5401b
[2012-06-22 10:00:33.046052] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7c44ea5f, ACCESS:
 NFS: 22(Invalid argument for operation), POSIX: 14(Bad address)
[2012-06-22 10:00:33.505755] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-scalability-1-client-1: remote op
eration failed: No such file or directory
[2012-06-22 10:00:35.913556] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-scalability-1-client-2: remote op
eration failed: No such file or directory
[2012-06-22 10:00:43.086167] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-scalability-1-client-0: remote op
eration failed: No such file or directory
[2012-06-22 10:00:43.086200] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-scalability-1-client-2: remote op
eration failed: No such file or directory
=================================

Steps to Reproduce:
1. Run fileop such that it creates a million directory.
Commandline:
# fileop -s 50K -b -w -d `pwd` -t -f 100

Give it a day or more, so that it creates huge number of directories.
Do a rm -rf on the directories.

Attached sosreport from the server.
Comment 1 Sachidananda Urs 2012-06-22 06:47:36 EDT
Created attachment 593689 [details]
SOS report
Comment 2 Sachidananda Urs 2012-06-22 07:05:38 EDT
There was only one client which was doing rm, so it is not possible that some other process would have deleted.
Comment 3 shishir gowda 2012-06-25 01:11:05 EDT
From the logs it looks like dht detects holes, and sends setxattr (layout) calls.
But, by then rmdir would have succeeded, and hence setxattr fails with ENOENT.
File-op also does clean-ups(rm), and the another manual rm -rf has been triggered.

I suspect this to be the scenario:
1. readdir returns entries
2. fileop/manual rm both are both in progress
3. one of these sends a lookup, and other removes the non-hashed dir. this is when a hole in the layout is detected.
4. A heal/setxattr of layouts is triggered, which fails, as the rmdir would have succeed by now.

If rm -rf fails, a new rm -rf on mount should clean successfully.
Please try to reproduce the bug.
Comment 4 shishir gowda 2012-07-10 23:55:41 EDT
Can you please try to reproduce the bug with the latest git repo?
Comment 5 Sachidananda Urs 2012-07-11 01:00:47 EDT
Unable to reproduce this issue. Will re-open again (with sosreport) if I can hit this issue again.

Note You need to log in before you can comment on or make changes to this bug.