Bug 533569

Summary: nfs4: two directories may have identical st_dev and st_ino
Product: [Fedora] Fedora Reporter: Jim Meyering <meyering>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: high    
Version: rawhideCC: dougsland, eblake, gansalmon, itamar, jlayton, kdudka, kernel-maint, meyering, nfs-maint, rwheeler, steved, tjn
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-18 15:29:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jim Meyering 2009-11-07 11:52:48 UTC
Description of problem: in the vicinity of a mount point directory, two directories may have the same device and inode number.  This is a serious problem because many tools treat the condition as indicating a hard directory cycle, which usually indicates file system corruption.

Version-Release number of selected component (if applicable):
2.6.31.5-122.fc12.x86_64

How reproducible: every time

Steps to Reproduce:
Based on the set-up from Kamil Dudka in https://bugzilla.redhat.com/show_bug.cgi?id=501848#c45

# mount | grep ^/
...
/dev/sda8 on /home type ext4 (rw,noatime)
...
# top=/home
# cat /etc/exports
# printf "/ *(fsid=0,crossmnt)\n$top *(crossmnt)\n" >> /etc/exports
# service nfs restart
...
# mkdir /tmp/mnt
# mount -t nfs4 localhost:/ /tmp/mnt
# stat --printf "%d %i %n\n" /tmp/mnt{,$top}
22 2 /tmp/mnt
22 2 /tmp/mnt/home

Then, using the very latest du from upstream coreutils.git,
I see this:

    $ du /tmp/mnt > /dev/null
    du: WARNING: Circular directory structure.
    This almost certainly means that you have a corrupted file system.
    NOTIFY YOUR SYSTEM MANAGER.
    The following directory is part of the cycle:
      `/tmp/mnt/home'

Actual results: above


Expected results: different dev and/or inode, no du failure


Additional info:

Comment 1 Steve Dickson 2009-11-10 18:33:05 UTC
> # stat --printf "%d %i %n\n" /tmp/mnt{,$top}
> 22 2 /tmp/mnt
> 22 2 /tmp/mnt/home
I do see this... but 
    
> $ du /tmp/mnt > /dev/null
> du: WARNING: Circular directory structure.
> This almost certainly means that you have a corrupted file system.
> NOTIFY YOUR SYSTEM MANAGER.
> The following directory is part of the cycle:
> `/tmp/mnt/home'

What kernel are you using and nfs-utils

Comment 2 Steve Dickson 2009-11-10 18:35:03 UTC
I meant to say... I don't see the du error... what kernel/nfs-utils are
you using..

Comment 3 Kamil Dudka 2009-11-10 18:37:09 UTC
(In reply to comment #2)
> I meant to say... I don't see the du error... what kernel/nfs-utils are
> you using..  

You need to compile GNU coreutils from git to see the error.

Comment 4 Jim Meyering 2009-11-10 18:39:56 UTC
Hi Steve, kernel version is listed above.
nfs-utils-1.2.0-18.fc12.x86_64

Comment 5 Jeff Layton 2009-11-10 18:45:33 UTC
I think I understand what the issue is here. I just don't think that there's much we can do about it...

The stat program is doing a lstat() and that doesn't trigger a submount (LOOKUP_FOLLOW isn't set). So we end up doing a GETATTR call that returns info on the root inode of the /home mount. So the stat() syscall gets the "real" st_ino of /tmp/mnt/home, but the st_dev is still that of the parent (/tmp/mnt).

This is particularly evident here because the root of any ext3/4 filesystem has an st_ino of 2.

I think our options are:

1) fix the kernel to trigger a submount even when LOOKUP_FOLLOW isn't set (quite possibly very hard on performance)

2) fix the kernel to return a bit more info when we have a "potential mountpoint" like this. My suggestion on LKML was to coopt a new st_mode/i_mode bit and use that to indicate that a directory is potentially a new mountpoint if someone were to walk into it

So far, my suggestion hasn't received any feedback upstream.

Comment 6 Bug Zapper 2009-11-16 15:17:01 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Jim Meyering 2010-03-16 07:48:36 UTC
AFAIK, nothing has changed, so I've reset "Version:" to rawhide.

Comment 8 Bug Zapper 2010-03-16 12:18:51 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Jim Meyering 2010-04-08 12:33:19 UTC
Still affects rawhide, too.

Comment 10 Bug Zapper 2010-07-30 10:46:43 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 11 Jim Meyering 2010-09-02 07:02:44 UTC
Changing version back to 'rawhide'.

Comment 12 Ric Wheeler 2012-10-17 07:45:33 UTC
Is this something that we can change in upstream or should we close this out?

Comment 13 Jeff Layton 2012-10-18 15:29:02 UTC
Not much we can do, I don't think...

If anything, the automount semantics are even less likely to trigger a mount these days. I think the only hope for this problem is the xstat() work that dhowells was working on, but that has sort of died upstream.

I'll go ahead and close this WONTFIX for now. Please reopen it if you want to discuss it further.