Bug 1749390

Summary: VFS: Busy inodes after unmount of loop0 when encountering duplicate directory inodes
Product: Red Hat Enterprise Linux 7 Reporter: Frank Sorenson <fsorenso>
Component: kernelAssignee: Miklos Szeredi <mszeredi>
kernel sub component: VFS QA Contact: Kun Wang <kunwan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: aviro, bfields, dhoward, dhowells, djeffery, dwysocha, esandeen, jaeshin, jaltman, lvaz, marc.c.dionne, mszeredi, swhiteho, tborcin, xzhou, yoyang, zyan
Version: 7.6Keywords: Regression, Reproducer, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: kernel-3.10.0-1106.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1774943 1781158 1781159 1781585 (view as bug list) Environment:
Last Closed: 2020-03-31 19:29:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1774943, 1781158, 1781159, 1781585    
Description Flags
proposed patch
proposed patch to work around buggy iso images none

Description Frank Sorenson 2019-09-05 14:07:25 UTC
Description of problem:

  isofs can contain directories which occur more than once in the filesystem (essentially hard-linked directories).

  Beginning in 7.6.z and 7.7, RHEL 7 now returns EIO when the second/subsequent occurrence is encountered.

  Earlier kernels and upstream still permit duplicates, and handle them as a hardlinked directory, so this is either a 7.6.z/7.7 regression, or all kernels other than RHEL 7.6.z/7.7 are incorrect.

Version-Release number of selected component (if applicable):

  RHEL 7.6.z kernels beginning sometime between 3.10.0-957.10.1.el7 and 3.10.0-957.21.1.el7
  RHEL 7.7 kernels beginning with daily snapshots sometime between 3.10.0-957.el7 and 3.10.0-1006.el7

How reproducible:

    easy, see steps below

Steps to Reproduce:

    # cd /var/tmp
    # mkdir -p iso_root/dir{1,2}
    # touch iso_root/dir1/testfile{1,2,3}
    # mount -obind iso_root/dir1 iso_root/dir2

    # mkisofs -o out.iso -fRJ iso_root >/dev/null 2>&1

    # mount -oloop out.iso /mnt/tmp >/dev/null 2>&1
    # ls -li /mnt/tmp

Actual results:

    # mount -oloop out.iso /mnt/tmp >/dev/null 2>&1; ls -li /mnt/tmp
    ls: cannot access /mnt/tmp/dir2: Input/output error
    total 2
    1856 drwxr-xr-x 2 root root 2048 Sep  4 16:42 dir1
       ? ?????????? ? ?    ?       ?            ? dir2

Expected results:

    # mount -oloop out.iso /mnt/tmp >/dev/null 2>&1 ; ls -li /mnt/tmp
    total 4
    1856 drwxr-xr-x. 2 root root 2048 Sep  4 16:42 dir1
    1856 drwxr-xr-x. 2 root root 2048 Sep  4 16:42 dir2

Additional info:

  This likely only affects isofs

  The first duplicated directory accessed works, and the second/subsequent will return EIO:

    # mount -oloop out.iso /mnt/tmp >/dev/null 2>&1 ; find /mnt/tmp >/dev/null ; umount /mnt/tmp
    find: ‘/mnt/tmp/dir2’: Input/output error
    # mount -oloop out.iso /mnt/tmp >/dev/null 2>&1 ; ls -lid /mnt/tmp/dir1 ; ls -lid /mnt/tmp/dir2 ; umount /mnt/tmp
    1856 drwxr-xr-x 2 root root 2048 Sep  4 16:42 /mnt/tmp/dir1
    ls: cannot access /mnt/tmp/dir2: Input/output error
    # mount -oloop out.iso /mnt/tmp >/dev/null 2>&1 ; ls -lid /mnt/tmp/dir2 ; ls -lid /mnt/tmp/dir1 ; umount /mnt/tmp
    1856 drwxr-xr-x 2 root root 2048 Sep  4 16:42 /mnt/tmp/dir2
    ls: cannot access /mnt/tmp/dir1: Input/output error

  there are no changes in isofs between the kernel versions mentioned.  The change appears likely to have occurred in one of the fs/dcache.c patches

Comment 3 Steve Whitehouse 2019-09-05 14:48:21 UTC
I'm not sure that I understand the issue. You cannot hard link a directory, so I'm trying to figure out whats really the issue here.

Comment 4 Frank Sorenson 2019-09-05 15:37:00 UTC
isofs creation permits the equivalent of a hardlinked directory.  When the iso is mounted, all kernels other than 7.6.z and 7.y (y > 6) behave as if hardlinked directories are allowed.

So either returning EIO is a RHEL 7 regression, or RHEL 8 (and upstream) have a bug, in that they do not return EIO.

Comment 5 Steve Whitehouse 2019-09-05 15:41:48 UTC
The bug is that we shouldn't have hard linked directories in the first place. In your example a bind mount is used before running mkisofs, and if that is what is intended, it could possibly be replicated by some means (effectively an auto-mount) when the fs is later mounted by the kernel. However, a hard linked directory is not allowed, and if there are some previous kernels that might appear to support it, then those are buggy. This sounds like something that needs to be resolved upstream.

Comment 6 Eric Sandeen 2019-09-05 17:40:43 UTC
So after the setup steps, we have this tree structure:

# tree --inodes iso_root/
├── [16853932]  dir1
│   ├── [16853933]  testfile1
│   ├── [16853934]  testfile2
│   └── [16853935]  testfile3
└── [16853932]  dir2
    ├── [16853933]  testfile1
    ├── [16853934]  testfile2
    └── [16853935]  testfile3

and if we list the dirs 1 at a time to not get the clash from teh vfs, we see that mkisofs did create them w/ the same inode nr:

# mount -oloop out.iso mnt/
# ls -id mnt/dir1
1856 mnt/dir1
# umount mnt

# mount -oloop out.iso mnt/
mount: /dev/loop0 is write-protected, mounting read-only
# ls -id mnt/dir2
1856 mnt/dir2

(Interestingly mksquashfs doesn't produce the same result, because it ends up with different inode nrs for the 2 dirs within the fs image)

So it's ... odd that mkisofs chooses to create two dirs w/ the same inode number.  I'm inclined to think that this may be a mkisofs bug, although it would be good to know why this result seems unique to RHEL7.

Comment 7 Frank Sorenson 2019-09-05 18:51:31 UTC
just as background, the customer encountered this issue with essentially the following:

# mkdir -p iso_root/release-{1.0,1.1,1.2,latest}
# mount server:/exports/rpms/rpms-1.0 iso_root/release-1.0
# mount server:/exports/rpms/rpms-1.1 iso_root/release-1.1
# mount server:/exports/rpms/rpms-1.2 iso_root/release-1.2
# mount server:/exports/rpms/rpms-1.2 iso_root/release-latest

# mkisofs -o out.iso -fRJ iso_root 2>&1

# mount -oloop out.iso /mnt/tmp 2>/dev/null

* affected kernels
# tree --inodes /mnt/tmp
├── [   1856]  release-1.0
├── [   1920]  release-1.1
└── [   2048]  release-latest

# ls -lid /mnt/tmp/* 
ls: cannot access /mnt/tmp/release-1.2: Input/output error
1856 drwxr-xr-x 2 root root 2048 Sep  5 13:20 /mnt/tmp/release-1.0
1920 drwxr-xr-x 2 root root 2048 Sep  5 13:20 /mnt/tmp/release-1.1
2048 drwxr-xr-x 2 root root 2048 Sep  5 13:20 /mnt/tmp/release-latest

* upstream kernel
# tree --inodes /mnt/tmp
├── [   1856]  release-1.0
├── [   1920]  release-1.1
├── [   2048]  release-1.2
└── [   2048]  release-latest

Comment 8 Frank Sorenson 2019-09-05 18:53:46 UTC
also, rhel 7.6.z and 7.7+ kernels throw the following message (not seen with other kernels) on unmount:
[1382870.155190] VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.  Have a nice day...


note also the following upstream behavior, when given a true directory loop:

# mkdir -p iso_root/dir3/subdir
# mount -obind iso_root/dir3 iso_root/dir3/subdir

# mkisofs -o out.iso -fRJ iso_root 2>&1 

# tree --inodes /mnt/tmp/
├── [   1856]  dir1
│   ├── [   1870]  testfile1
│   ├── [   1866]  testfile2
│   └── [   1862]  testfile3
├── [   1856]  dir2
│   ├── [   1870]  testfile1
│   ├── [   1866]  testfile2
│   └── [   1862]  testfile3
└── [   1984]  dir3

# ls -lid /mnt/tmp/dir3 /mnt/tmp/dir3/subdir
ls: cannot access /mnt/tmp/dir3/subdir: Too many levels of symbolic links
1984 drwxr-xr-x 3 root root 2048 Sep  5 10:27 /mnt/tmp/dir3

in the kernel messages:
[1378462.410201] VFS: Lookup of 'subdir' in iso9660 loop0 would have caused loop

(no 'busy inodes' message on unmount)

Comment 9 Eric Sandeen 2019-09-05 23:52:01 UTC
Bisecting it lands on

commit 7449bc83052a970f5fc15d6c94369cc1b2a214eb (HEAD)
Author: Zheng Yan <zyan>
Date:   Thu Oct 4 07:02:44 2018 +0200

    [fs] dcache: d_splice_alias mustn't create directory aliases

Comment 10 Eric Sandeen 2019-09-06 14:11:09 UTC
Talking w/ Viro - isofs returning same inode nr for 2 dirs /is/ a bug.  Not clear whether it's in mkisofs, or in the kernel isofs driver, i.e. how isofs_get_ino() generates inodes.

But, upstream silently mitigates the problem:

} else if (!IS_ROOT(new)) {
        struct dentry *old_parent = dget(new->d_parent);
        int err = __d_unalias(inode, dentry, new);  <----- HERE
        if (err) {
                new = ERR_PTR(err);

Essentially, this was done by

b5ae6b15b merge d_materialise_unique() into d_splice_alias()

upstream, which should be backported.

Comment 11 Eric Sandeen 2019-09-06 14:13:01 UTC
[08:56]  <viro> OK, what you want is commit b5ae6b15bd73e35b129408755a0804287a87e041
[08:56]  <viro> Author: Al Viro <viro.org.uk>
[08:56]  <viro> Date:   Sun Oct 12 22:16:02 2014 -0400
[08:56]  <viro>     merge d_materialise_unique() into d_splice_alias()
[08:56]  <viro> backported
[08:57]  <viro> essentially, d_materialize_unique() is d_splice_alias() with opposite order of arguments *and* mitigation of that kind of crap
[08:58]  <viro> in mainline I'd just folded the latter into d_splice_alias() and killed d_materialize_unique() off
[08:58]  <viro> very few callers of the latter to start with
[08:59]  <viro> in RHEL we'd want a wrapper for d_splice_alias() - basically, calling it with the arguments flipped
[08:59]  <viro> again, kABI...

Comment 12 Frank Sorenson 2019-09-06 15:20:11 UTC
I have opened bz1749860 against cdrkit in RHEL 8 to address creation of ISOs with hardlinked directories

Comment 13 Eric Sandeen 2019-09-06 17:19:59 UTC
(In reply to Frank Sorenson from comment #12)
> I have opened bz1749860 against cdrkit in RHEL 8 to address creation of ISOs
> with hardlinked directories

Sounds good, if I use OSX to create an iso image from the directories problematic ISO created above, it assigns new/distinct inode numbers so this does seem like a mkisofs flaw as well.

Comment 14 Eric Sandeen 2019-09-06 18:46:53 UTC
I had a go at al's suggestion but I'm not confident enough messing with dcache.c, would be better to have someone more familiar investigate it I think.

Comment 15 Steve Whitehouse 2019-09-19 13:56:41 UTC
Miklos, would you be able to take a look at this? I know it is outside of your normal realm, but dcache knowledge is required in order to provide the required fix I here it seems.

Comment 16 Miklos Szeredi 2019-09-19 14:16:30 UTC

Comment 17 Steve Whitehouse 2019-09-24 14:03:43 UTC
Miklos, do you have any updates? We need to figure out the correct flags/version that we want to target.

Comment 20 Miklos Szeredi 2019-09-25 08:32:18 UTC
Created attachment 1618935 [details]
proposed patch

So the real bug here is the lack of iput() on error.  Easily fixed with the attached patch (untested).

I wouldn't mess with backporting commit b5ae6b15bd73 ("merge d_materialise_unique() into d_splice_alias()"), as it's basically irrelevant if lookup returns EIO or ELOOP in the broken iso image case.

Comment 23 Miklos Szeredi 2019-09-30 08:58:02 UTC
Created attachment 1621014 [details]
proposed patch to work around buggy iso images

So if we want to mitigate buggy mkisofs, then we do need __d_unalias().  But it's a kludge: it can still sporadically fail with EBUSY if hard linked directories have a different parent directory.  Also regardless of the parent directories results from getcwd(2) will jump around the hard links' location based on which one was accessed last.

I've tried to carefully compare what d_splice_alias() and d_materialise_unique() do, and AFAICS the patch is correct, but this is not a trivial fix, like the previous patch, so obviously it's higher risk.

There's one change that I haven't fully looked into: after the patch security_d_instantiate() isn't called on an extant alias, while previously it was.  I don't know if this was just an oversight or if there was some reason to that security_d_instantiate() call.  A security person needs to look at that.

Comment 32 Jan Stancek 2019-10-26 10:03:01 UTC
Patch(es) committed on kernel-3.10.0-1106.el7

Comment 40 Marc Dionne 2019-11-28 13:48:05 UTC
May I ask which of the 2 attached patches has been included in the 1106 kernel and will be in 7.8, since I don't have access to it?

Also, are there any plans to include a fix in 7.6 and 7.7 kernels?

Comment 41 Steve Whitehouse 2019-11-28 13:53:17 UTC
Just the comment #20 patch, which is the one with the iput additions. Currently I don't see any requests for backport to earlier kernels. The real issue here however is that the images are incorrect and we have a bug open for mkisofs to fix that issue as a separate item.

Comment 46 errata-xmlrpc 2020-03-31 19:29:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 47 Red Hat Bugzilla 2024-01-06 04:26:42 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days