Bug 1066751
Summary: | tmpfs: creates files with inode number 0, rendering parent directory unremovable | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Roni Gordon / Casale Media <roni.gordon> | ||||
Component: | kernel | Assignee: | Carlos Maiolino <cmaiolin> | ||||
kernel sub component: | Other | QA Contact: | Murphy Zhou <xzhou> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | ajb, cmaiolin, eguan, esandeen, kdudka, kfujii, manuel.wolfshant, myamazak, pasteur, riel, roni.gordon, rwheeler, salmy, tejaswinipoluri3, toracat, trajaraman, yanwang, yohmura | ||||
Version: | 6.4 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.32-582.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1241665 (view as bug list) | Environment: | |||||
Last Closed: | 2016-05-10 21:52:20 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1075802, 1159933, 1172231, 1241665, 1268411 | ||||||
Attachments: |
|
Description
Roni Gordon / Casale Media
2014-02-19 03:22:31 UTC
We modified https://raw.github.com/aidenbell/getdents/master/src/getdents.c, which was originally designed as a faster alternative to ls, so that it would only list files with inode number 0: - if( d->d_ino != 0 && d_type == DT_REG ) { - printf("%s\n", (char *)d->d_name ); + if( d->d_ino == 0 && d_type == DT_REG ) { + printf("Inode number %ld: %s\n", d->d_ino, (char *)d->d_name ); And much to our horror/delight, the mystery filename that neither ls nor rm could locate appeared out of thin air: # gcc getdents.c -o getdents # getdents 1381276560 Inode number 0: 71A800181400 This file was completely intact (i.e. contained the correct contents and typical file size for a file in this directory), and could be trivially deleted by name: # cat 71A800181400 | wc -c 776 # rm 71A800181400 rm: remove regular file `71A800181400'? y At which point removing its parent directory was no longer an issue (directory block size was restored, etc.), and our problem went away. It's possible that it's remained unknown because the following things need to occur in order to get this unlikely situation to re-occur: 1) have a server with sufficient uptime to generate ~4.3G files on a device with a reboot; and 2) have the file that would be allocated inode 0 for that device created on the TMPFS partition; and 3) trigger a process which deletes these TMPFS files without knowledge of their name; and finally 4) try to delete the parent directory Nonetheless, we consider this a bug in TMPFS -- there's no reason to hand out a reserved inode number when starting again at 1 would be just fine, and thereby never encounter this issue. Further details (strace, etc.) are all cross-posted in the original CentOS bug (see link in external tracker section). The filesystem component has nothing to do with the file system implementation in kernel. I am switching the component... Sounds like the VFS should not be generating inode number 0. Rik, I agree. Roni, have you contacted your RHEL support team for this bug? @Eric: this bug was logged here at the request of a CentOS developer (http://bugs.centos.org/view.php?id=6992#c19301) -- the kernel bug was discovered on CentOS, which is running the same kernels as the RHEL releases mentioned above. Ok, but RHEL staff doesn't support CentOS, so this won't have a high priority compared to other customer bugs. If you can reproduce it on an upstream kernel, sending the problem to LKML or the fs-devel mailing list may get some traction. @Eric, I would rather see this as 'fixing the EL kernel' than 'supporting CentOS'. From that point of view, as you suggested, if this can be reproduced in the mainline kernel, reporting the problem to LKML would be the right thing to do. I suppose RHEL kernel would not be fixed anyway unless the patch is in the upstream kernel. But once this issue is brought to LKML, aren't you the one who would be in charge? :) Not necessarily me, no ;) I do appreciate bug reports from RHEL clones; it's free testing and all that, and sometimes exposes serious issues that we need to get right on top of. But like just about everyone, there's more work to do than there are hours in the day. If I have customers waiting on me, they have to come first. And I always have customers waiting on me. ;) So I'm asking you to do a little legwork to help me out; if you can still duplicate the behavior upstream, that's a great datapoint. If you *can't* then knowing which kernel release fixed it would be very valuable as well. If you wanted to go so far as to take the issue upstream, if it persists, that'd be super helpful too. Maybe someone who is intimately familiar w/ the generic inode counters will know what the obvious one-liner fix is. IOWS: Backporting an upstream patch is a lot less work than testing, triaging, reading code, patching, testing again, etc. You can help me get there. ;) -Eric (In reply to Eric Sandeen from comment #10) > So I'm asking you to do a little legwork to help me out; if you can still > duplicate the behavior upstream, that's a great datapoint. If you *can't* > then knowing which kernel release fixed it would be very valuable as well. Do you have any preference as to which upstream kernel is tested? I'll see if I can do some of that "legwork" on my end. thanks. Usually, if I want to know if something is fixed upstream, I test the latest kernel available in git. If that's out of scope for you, see if your distro has something prebuilt which is almost that new. -Eric Yes, indeed there is such a thing. The latest mainline kernel is available for RHEL from The ELRepo Project [1] and is named kernel-ml [2]. [1] http://elrepo.org [2] http://elrepo.org/tiki/kernel-ml Recreated on kernel 3.13.5-1.el6.elrepo.x86_64 (latest stable kernel): [root@centos6-5 inode0Test]# rm -rf test rm: cannot remove `test': Device or resource busy [root@centos6-5 inode0Test]# cd test [root@centos6-5 test]# ls -i 3946307038 ? 3946307038 [root@centos6-5 test]# stat 3946307038 File: `3946307038' Size: 10 Blocks: 8 IO Block: 4096 regular file Device: 14h/20d Inode: 0 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2014-03-01 11:00:11.561272342 -0500 Modify: 2014-03-01 11:00:11.561272342 -0500 Change: 2014-03-01 11:00:11.561272342 -0500 Ok, I'll dig into this one - but while an i_ino of "0" is a problem, I agree with Rik that the potential for duplicate inodes is another serious problem. Perhaps the use case is such that no file is long-lived enough for the counter to wrap and obtain a duplicate inode number, but otherwise I think it'd be a concern - and one that is unlikely to be addressed in a simple filesystem like tmpfs. -Eric Looks like there is no problem with the kernel at all, actually, there is no problem in having a file with inode 0. The problem to remove the files with inode zero, appears to be caused by userspace tools (coreutils executables and/or glibc). I'm still investigating more in depth where the problem exactly is, but, afaik, the readdir() function provided by glibc used to ignore files with the inode 0, due pre-historic problems. AFAICT, this readdir() behavior has been fixed a while ago, but, I'm still investigating the readdir() beahvior to get more information about it. -Carlos (In reply to Carlos Maiolino from comment #22) > I'm still investigating more in depth where the problem exactly is, but, > afaik, the readdir() function provided by glibc used to ignore files with > the inode 0, due pre-historic problems. This link (http://stackoverflow.com/questions/2099121/why-do-inode-numbers-start-from-1-and-not-0) seems to suggest some "historic" issues with EXT2, and possibly MacOS related to deleted-but-not-removed files. Hi, I confirmed, this is not a kernel bug, but, the problem described here is caused by the behavior of glibc library. I'm looking for now, about how should we proceed with this bug. -Carlos Hi, just an update about the bug. Although the problem with removing the files isn't a problem with the kernel, but, with the way glibc treats files with inode 0, the kernel development team decided that a better solution here will be to avoid VFS to allocate an inode 0 to a file (when the inode is generated by VFS, like the tmpfs case). The problem, as already know, is still visible in upstream kernels, and a discussion to fix the problem is already happening, so we should have a resolution soon. -Carlos Hii, We are facing the same problem in 3.13 kernel as well. It would be great if you can share the the upstream kernel discussion on the same and if any details of the issue fix. - Tejaswini For 2.6.32 kernel, we have tried the following fix and it worked. It would be great if you can review the same and confirm it. @@ -682,6 +682,8 @@ struct inode *new_inode(struct super_block *sb) if (inode) { spin_lock(&inode_lock); __inode_add_to_lists(sb, NULL, inode); + if (unlikely(!(last_ino + 1))) + last_ino = 0; inode->i_ino = ++last_ino; inode->i_state = 0; spin_unlock(&inode_lock); - Tejaswini For 2.6.32 kernel, we have tried the following fix in fs/inode.c and it worked. It would be great if you can review the same and confirm it. @@ -682,6 +682,8 @@ struct inode *new_inode(struct super_block *sb) if (inode) { spin_lock(&inode_lock); __inode_add_to_lists(sb, NULL, inode); + if (unlikely(!(last_ino + 1))) + last_ino = 0; inode->i_ino = ++last_ino; inode->i_state = 0; spin_unlock(&inode_lock); - Tejaswini Hello Tejaswini, A similar solution was provided previously upstream, but has not been accepted at a first glance. I'm talking to another developers to see what would still be the best solution for that. I'm looking for the discussion thread to provide it to you, as soon as I find it I let you know Thanks Carlos. Looking forward for the updates. Also there are repercussions of wrapping around like the possibility of having the same inode number for two files right. Did you find any fixes for the same? Hi, The followings are test logs which observed on Ubuntu 14.04. Since the issue exist in latest kernel and the "test" directory is unable to remove, I have done this test on Ubuntu 14.04. Test:1 ====== root@cavium-desktop:~/temp# ls test test-logs root@cavium-desktop:~/temp# rm -rf test rm: cannot remove ‘test’: Directory not empty root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# ls -la test total 0 d--x--x--x 2 root root 60 Jun 24 15:00 . dr--r--r-- 3 root root 80 Jun 24 15:28 .. root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# stat test File: ‘test’ Size: 60 Blocks: 0 IO Block: 4096 directory --------------------------------------------------------> [size 60 ] Device: 1bh/27d Inode: 149834664 Links: 2 Access: (0111/d--x--x--x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-06-24 15:00:29.658154496 +0530 Modify: 2015-06-24 15:00:28.434154518 +0530 Change: 2015-06-24 15:00:28.434154518 +0530 Birth: - root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# Test:2 ====== The directory "test1" created in the same tmpfs mount path root@cavium-desktop:~/temp# mkdir test1 root@cavium-desktop:~/temp# touch test1/test-file root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# stat test1 File: ‘test1’ Size: 60 Blocks: 0 IO Block: 4096 directory Device: 1bh/27d Inode: 1024301708 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-06-24 15:25:50.902127297 +0530 Modify: 2015-06-24 15:26:01.166127114 +0530 Change: 2015-06-24 15:26:01.166127114 +0530 Birth: - root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# ls -la test1/ total 0 drwxr-xr-x 2 root root 60 Jun 24 15:26 . dr--r--r-- 4 root root 80 Jun 24 15:25 .. -rw-r--r-- 1 root root 0 Jun 24 15:26 test-file root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# rm -rf test1/* root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# stat test1 File: ‘test1’ Size: 40 Blocks: 0 IO Block: 4096 directory ------------------------------------------------------> [ size: 40 ] Device: 1bh/27d Inode: 1024301708 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-06-24 15:26:16.902126832 +0530 Modify: 2015-06-24 15:26:27.382126645 +0530 Change: 2015-06-24 15:26:27.382126645 +0530 Birth: - root@cavium-desktop:~/temp# root@cavium-desktop:~/temp# As per my observation about this issue from Test:1 and Test:2 logs, if the directory is empty, the i_node_size is points to value as "40" as well as if the directory is not empty, the i_node_size is points to value as either "60" or greater. Though able to remove all files inside "test" directory, During the file deletion operation inside "test" dir is not properly updated with value "40" but found with "60" due to the file has inode as "0". No files inside the dir but the i_node_size value is "60", i am suspecting this is going to be the issue for failure found with file removal(tmpfs). Is force updating the i_node_size value as "40" while the directory is empty which may makes the file removal possible? Please correct me if my observation is wrong and suggest your comments. Thanks, Thiruvadi rajaraman Hello Thiruvadi, The only reason I can tell you now, that you are seeing in your tests, a bigger size in a supposed empty directory, is that it is not empty at all, which is the reason of this BZ. Files with inode 0 are not listed by the userspace tools, so, although you are not seeing the files, they are inside the directory, and this is the reason why you could not delete the directory. The possibility of removing the directory and the files inside it, has nothing to do with the size of the directory, but with the inodes inside it. Any file with inode 0 will not be processed by userspace tools based on glibc, which ignores files with this inode number. Hii Carlos, We are trying to work on another work around for inode zero. As glibc is not recognizing inode 0, i thought the kernel can check if inode 0 is present while doing rmdir and delete it if present. The following are the code changes. It would be great if you can review the same: static int shmem_rmdir(struct inode *dir, struct dentry *dentry) { if (!simple_empty(dentry)) { /*Check if it has a zero inode*/ struct dentry *child; int is_inode_zero = 0; list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) if (child->d_inode->i_ino == 0) { printk("XXX %s:inode is zero \n",__func__); iput(child->d_inode); is_inode_zero = 1; } if(!is_inode_zero) return -ENOTEMPTY; } drop_nlink(dentry->d_inode); drop_nlink(dir); return shmem_unlink(dir, dentry); } Hi tejaswini. I don't believe this is the correct approach, because you are just limiting the handling of the inode 0 for rmdir call and for tmpfs only, but as you said, this is a workaround and might work for you. FYI, I sent a patch past week to linux-fsdevel list, with changes I discussed with some other filesystem developers. http://marc.info/?l=linux-fsdevel&m=143526593507774&w=2 The main idea is to avoid VFS to create an inode with number 0, when using get_next_ino(), so, with this approach, all filesystems relying on VFS for inode creation will not be able to create a file with inode 0. I'm waiting for coments in the patch I sent, or for it to be picked-up by the vfs maintainer, let's see what will happen. If the patch is accepted, it should hit upstream code soon. Cheers Hii Carlos, Yeah.I took that approach considering that inode 0 creation shouldn't be a problem. And the workaround was done for tmpfs alone because all the matured filesystems like ext4 have their own mechanism for allocating inode numbers. Regarding the patch, I am just wondering if it shouldn't be while(res) instead of while(!res). Correct me if i am wrong. I might be missing something. -Tejaswini I am sorry regarding the previous comment on the patch.I was little confused. What is the reason for moving from unlikely to do while loop ? Hi tejaswini, it's more for readability. Patch(es) available on kernel-2.6.32-582.el6 TEST PASS Reproduced with inodeOverflow.pl[1] on -573.el6 kernel. ... Tue Apr 12 01:09:23 2016 Current inode number:4294000000 File '/tmpfs//test/4233754690' created with inode number 0 Finished # uname -r 2.6.32-573.el6.x86_64 # stat /tmpfs//test/4233754690 File: `/tmpfs//test/4233754690' Size: 10 Blocks: 8 IO Block: 4096 regular file Device: 15h/21d Inode: 0 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2016-04-12 01:11:08.573329775 +0800 Modify: 2016-04-12 01:11:08.573329775 +0800 Change: 2016-04-12 01:11:08.573329775 +0800 Verified on -639.el6 kernel. After running multiple instance for days: ... Thu Apr 14 14:26:38 2016 Current inode number:4291000000 Thu Apr 14 14:27:10 2016 Current inode number:4292000000 Thu Apr 14 14:28:13 2016 Current inode number:4294000000 Thu Apr 14 14:29:47 2016 Current inode number:2000000 # no overflow Thu Apr 14 14:32:27 2016 Current inode number:7000000 ... # ps -ef | grep erl root 16171 16083 99 Apr12 pts/1 2-23:34:40 perl inodeOverflow.pl /tmpfs/ root 24087 24063 99 Apr13 pts/0 1-16:02:17 perl inodeOverflow.pl zxm/ root 24126 24103 99 Apr13 pts/3 1-15:57:49 perl inodeOverflow.pl zxm1 root 31685 31642 0 10:21 pts/2 00:00:00 grep erl # uname -r 2.6.32-639.el6.x86_64 [1] https://bugzilla.redhat.com/attachment.cgi?bugid=1241665&action=enter [2] set testcoverage to - due to long long test time. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0855.html |