Bug 1414608 - Weird directory appear when rmdir the directory in disk full condition
Summary: Weird directory appear when rmdir the directory in disk full condition
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: All
high
high
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-19 01:52 UTC by George
Modified: 2019-08-05 11:11 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-08-05 11:11:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description George 2017-01-19 01:52:53 UTC
Description of problem:
Weird directory appear when rmdir the directory in disk full condition,
when rmdir the directory, the directory can't be deleted completely, and could enter the directory, but any FOP on this directory will prompt "No such file or directory", such like the below output
-----------------------------------------------------
#touch xxx
touch: cannot touch ‘xxx’: No such file or directory
-----------------------------------------------------

Version-Release number of selected component (if applicable):


How reproducible:
enter an mountpoint with glusterfs, lets say:/mnt/gluster/

Steps to Reproduce:
1. open a terminal :mkdir /mnt/gluster/test
2. use a shell script to create more small file with size 100byte until disk is full
3. open a new terminal, and enter /mnt/gluster/test, (let this directory refcount not be zero while step 4.rmdir)
4. in the old terminal do #rmdir -rf /mnt/gluster/test


Actual results:
/mnt/gluster/test no deleted successfully, and could re-enter the directory,
but with WIRED.such like the below output
-----------------------------------------------------
[/mnt/gluster/test]#touch xxx
touch: cannot touch ‘xxx’: No such file or directory
-----------------------------------------------------

Expected results:
direcotry /mnt/gluster/test should be deleted.

Additional info:

Comment 1 Niels de Vos 2017-01-24 12:36:36 UTC
Could you let us know what kind of volume you are using for this? Can it be reproduced with only certain volume types, or even a volume consisting out of a single brick?

Is it possible for you to provide the logs of this problem, and mention the exact time/date (preferably in UTC) when it happened?

Comment 2 George 2017-01-25 02:10:02 UTC
it is the volume of AFR type, in my env, we have the volume with replicate 2 bricks.
and if with only one brick( or let a brick down), will have no this issue.

it could be easy reproduced with the AFR with 2 bricks and the step I shared, so I suggest you could collect the log which you are wanted.

and with my investigation these days, the root cause seems clear:
1) in disk full condition of brick, write a lots small file from client will lead to zero entry created in the brick, but the FOP write will fail, and in this case, the zero entry of file in 2 brick will not the same, that mean one entry of file will created in the brick A, but not created in brick B, and another entry will created in brick B, but not created in created A.
2) because disk full condition in both brick, the auto heal can't heal the entry.
3) so the result is when create a lots file in both brick finished, the result is list (ls) on both brick directory, the count of file entry is not the same.
4) and when rm command executed on client mount point, the RM will just getdents from the first brick , such like brick A. and unlink all the file from brick A, it will also unlink the same file existed in brick B, but will not remove the file which not exist in brick A , but exist in brick B.
5) the glusterfs current implement will return success when rmdir success in brick A, but not success in brick B due to some files not unlinked.
6) and because another process still open the directory which rmdir successed, the kernel will keep the inode wtih S_DEAD(means "removed, but still open directory")
7) when "cd" command executed in client to change directory to the "removed" directory, the lookup of glusterfs will return success because the directory not removed in brick B, and heal will triggerred , because more file is unlink in previous step, so in this step will lead to the file exist in brick B sync to brick A, so "cd" command will success. that mean in client, it can entry the "removed" directory.
8) but when "ls" command run in the "removed" directory, it will trigger getdents syscall, and from the kernel view, the direcoty is removed , so it will TERMINCATED the syscall and return to user space result with "no such file and directory", the touch and other write FOP with same result.

2 solutions for this issue:
1)if FOP is rmdir, let it return failed if one brick return failed. (current if one is success treat it as success), do this change make sense? will it lead to any risks?

2)give threshold reserved disk space on client, let's say 100M, or a percent rate lets saye 1%, in client if the disk space is less than the threshold, the new coming write FOP should be rejected. I have seem a parameter with "cluster.min-free-disk", but it seems can't work for this issue, how can I use the parameter? does it works for this issue, if not, could we enhance the parameter to avoid this issue?

your comments is highly welcome:) and I will try the solution 1 from now, will update to you the result if I have.

Thanks a lots!

Comment 3 George 2017-02-07 08:18:43 UTC
although I changed afr_rmdir to let return fail even success on one volume. now it will return fail if one sub volume return failed with noENT, for other error, still use the old logic.
with this change, the touch weird issue seems gone, but another issue coming,
we find the files in the old weird directory can't sync with heal automatically.  it meaning with "ls" command, it can't show any files in the directory, though the files do exist in volume B, while no exist in volume A, the files do been removed by rmdir command.

after do some investigation, I suppose some file created in disk full condition with mknod success, but failed with gfid and changelog setting, and heal mechanism need the information to do auto sync up. now the changelog xattr element is not exist, so heal can't work at all.

so I suggest in posix translator, how about reserve the heal needed element space in that phase, if all element can reserved, then create the entry, else remove the entry at all.

how do you think this suggestion?

Thanks in advance!

Comment 4 Pranith Kumar K 2017-05-17 05:45:57 UTC
I think Ravi is working on a similar bug. Assigned the bug to him.

Comment 5 Amar Tumballi 2019-05-10 12:37:49 UTC
is this still an isuue?

Comment 6 Ravishankar N 2019-08-05 11:11:11 UTC
Disk full scenarios can cause problems ranging from ENOENT during creates to ENOTEMPTY during rmdirs to heal not progressing due to lack of gluster xattrs.  Recent versions of gluster have 'storage.reserve' volume option in posix xlator to reserve space for rebalance, heals etc. That should mitigate this to some extent. But even that is not entirely race free as it checks and updates free space only once in 5 seconds. I'm going ahead and closing this bug as CURRENTRELEASE.

George, please feel free to re-open the bug if storage.reserve doesn't solve your use case or if you have other ideas to solve this in a more robust way.


Note You need to log in before you can comment on or make changes to this bug.