Bug 584484

Summary: When doing r/o bind mounts ro flag is improperly prograted back to source device.
Product: [Fedora] Fedora Reporter: Lennart Poettering <lpoetter>
Component: util-linux-ngAssignee: Karel Zak <kzak>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: anton, dougsland, gansalmon, itamar, jonathan, kernel-maint, kmcmartin, kzak
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-14 10:49:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
robind.c that works none

Description Lennart Poettering 2010-04-21 16:51:20 UTC
R/O bind mounts don't work on the F13 kernel, as they supposedly should:

[root@omega ~]# mkdir /tmp/a /tmp/b
[root@omega ~]# mount --bind /tmp/a /tmp/b
[root@omega ~]# mount -o ro,remount /tmp/b
mount: /tmp/b is busy
[root@omega ~]# 

Regardless what I do I get an EBUSY there and I don't think I should.

2.6.33.1-24.fc13.x86_64

Comment 1 Lennart Poettering 2010-04-21 16:55:04 UTC
This seems to work on 2.6.33-1.fc13.i686

Comment 2 Lennart Poettering 2010-04-21 21:47:46 UTC
OK, I think I know more now:

The remount request is apparently applied to the whole of /tmp if you type it like shown above, and depending on whether somebody has a file open for write on /tmp this will fail with EBUSY or not. If it doesn't fail, then the entire /tmp tree is actually made read-only! 

As it appears read-only bind mounts are hence entirely broken:

20 [root@omega] /tmp# mkdir a b
21 [root@omega] /tmp# mount --bind a b
22 [root@omega] /tmp# touch /tmp/waldo /tmp/a/waldo2 /tmp/b/waldo3
23 [root@omega] /tmp# mount -o ro,remount a b
24 [root@omega] /tmp# touch /tmp/waldo /tmp/a/waldo2 /tmp/b/waldo3
touch: setting times of `/tmp/waldo': Read-only file system
touch: cannot touch `/tmp/a/waldo2': Read-only file system
touch: cannot touch `/tmp/b/waldo3': Read-only file system
25 [root@omega] /tmp# mount -o rw,remount /tmp
26 [root@omega] /tmp# touch /tmp/waldo /tmp/a/waldo2 /tmp/b/waldo3
touch: cannot touch `/tmp/b/waldo3': Read-only file system
27 [root@omega] /tmp# 

That is on a freshly booted 2.6.33.2-57.fc13.x86_64.

Comment 3 Lennart Poettering 2010-04-21 21:57:29 UTC
Hmm, so i played around with --make-private, under the assumption that this weirdness might have something to do with the shared subtree logic, but this didn't change anything: even if both /tmp and /tmp/b are marked as "private" the ro change on /tmp/b will still be reflected back to /tmp. I even made /tmp/a a bind mount on itself and also marked it private, to no luck.

In summary, there is something wrong with the MS_RDONLY flag propagation for bind mounts.

Comment 4 Lennart Poettering 2010-04-21 22:00:35 UTC
Hmm, that is on ext3 btw.

Comment 5 Lennart Poettering 2010-04-22 15:01:57 UTC
Btw, just for completeness sake, the two actual mount syscalls involved above look like this:

17 [root@omega] /tmp# strace -e mount mount --bind a b
mount("/tmp/a", "b", 0x7f15416dddd0, MS_MGC_VAL|MS_BIND, NULL) = 0

and

19 [root@omega] /tmp# strace -e mount mount -o ro,remount a b
mount("/tmp/a", "b", NULL, MS_MGC_VAL|MS_RDONLY|MS_REMOUNT, NULL) = -1 EBUSY (Device or resource busy)

Comment 6 Kyle McMartin 2010-04-22 15:06:34 UTC
I think the problem is a documentation problem.

You're asking mount to:
 mount -o remount,ro [list of mounts]

And /tmp/a is not a mount, it's a subdir of /tmp. So you end up with /tmp mounted readonly.

[root@ihatethathostname tmp]# mkdir a b
[root@ihatethathostname tmp]# mount --bind a b
[root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
[root@ihatethathostname tmp]# ls a
waldo2  waldo3
[root@ihatethathostname tmp]# ls b
waldo2  waldo3
[root@ihatethathostname tmp]# mount -o remount,ro b
[root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
touch: cannot touch `b/waldo3': Read-only file system
[root@ihatethathostname tmp]# mount -o remount,rw b
[root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
[root@ihatethathostname tmp]# mount -o remount,ro a b
[root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
touch: cannot touch `waldo': Read-only file system
touch: cannot touch `a/waldo2': Read-only file system
touch: cannot touch `b/waldo3': Read-only file system

The first case, mount -o remount,ro /tmp/b (which is the bind mount) seems to work as intended.

Comment 7 Lennart Poettering 2010-04-22 20:23:21 UTC
(In reply to comment #6)
> I think the problem is a documentation problem.
> 
> You're asking mount to:
>  mount -o remount,ro [list of mounts]

Actually not. I was only passing the exact same args as with the original mount, which is what the man page suggests.
 
> And /tmp/a is not a mount, it's a subdir of /tmp. So you end up with /tmp
> mounted readonly.

That's not what happens, if you strace things. mount will only issue one mount() syscall, not two.

> [root@ihatethathostname tmp]# mkdir a b
> [root@ihatethathostname tmp]# mount --bind a b
> [root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
> [root@ihatethathostname tmp]# ls a
> waldo2  waldo3
> [root@ihatethathostname tmp]# ls b
> waldo2  waldo3
> [root@ihatethathostname tmp]# mount -o remount,ro b
> [root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
> touch: cannot touch `b/waldo3': Read-only file system
> [root@ihatethathostname tmp]# mount -o remount,rw b
> [root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
> [root@ihatethathostname tmp]# mount -o remount,ro a b
> [root@ihatethathostname tmp]# touch waldo a/waldo2 b/waldo3
> touch: cannot touch `waldo': Read-only file system
> touch: cannot touch `a/waldo2': Read-only file system
> touch: cannot touch `b/waldo3': Read-only file system
> 
> The first case, mount -o remount,ro /tmp/b (which is the bind mount) seems to
> work as intended.    

Hmm, this is certainly interesting.

I have now prepared this C test case:

http://0pointer.de/public/robind.c

Which hopefully shows the problem. I marked with assert()s the expected outcome, and at least on my machine I will run into two different of the asserts, depending whether /tmp can be r/o mounted or not, depending on whether some app has a writable file open or not.

Comment 8 Lennart Poettering 2010-04-22 20:32:01 UTC
BTW, on my machine the strace for this test case is something like this:

mkdir("/tmp/a", 0777)                   = 0
mkdir("/tmp/b", 0777)                   = 0
mount("/tmp/a", "/tmp/b", NULL, MS_BIND, NULL) = 0
open("/tmp/waldo1", O_WRONLY|O_CREAT, 0777) = 3
close(3)                                = 0
open("/tmp/a/waldo2", O_WRONLY|O_CREAT, 0777) = 3
close(3)                                = 0
open("/tmp/b/waldo3", O_WRONLY|O_CREAT, 0777) = 3
close(3)                                = 0
mount(NULL, "/tmp/b", NULL, MS_RDONLY|MS_REMOUNT, NULL) = 0
open("/tmp/waldo1", O_WRONLY|O_CREAT, 0777) = -1 EROFS (Read-only file system)
brk(0)                                  = 0x996e000
brk(0x998f000)                          = 0x998f000
write(2, "robind: robind.c:33: main: Asser"..., 54robind: robind.c:33: main: Assertion `r >= 0' failed.
) = 54
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 8118
tgkill(8118, 8118, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

The two mounts go through, but we see how the second one changed the r/o bit of /tmp, so that subsequent write accesses fail with EROFS. And that shouldn't happen.

Comment 9 Kyle McMartin 2010-04-23 14:49:27 UTC
kyle@phobos / $ sudo mount -t tmpfs none /kyle
kyle@phobos / $ cd /kyle
kyle@phobos /kyle $ sudo mkdir a b
kyle@phobos /kyle $ sudo touch foo-root a/foo-A b/foo-B
kyle@phobos /kyle $ sudo strace -e mount mount --bind a/ b/
mount("/kyle/a", "b/", 0x7f3b06ee8dd0, MS_MGC_VAL|MS_BIND, NULL) = 0
kyle@phobos /kyle $ cat /proc/mounts | grep kyle
none /kyle tmpfs rw,relatime 0 0
none /kyle/b tmpfs rw,relatime 0 0
kyle@phobos /kyle $ sudo touch foo-root a/foo-A b/foo-B
kyle@phobos /kyle $ sudo strace -e mount mount -o remount,ro b/
mount("/kyle/a", "/kyle/b", 0x7f1581bd3dd0, MS_MGC_VAL|MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
kyle@phobos /kyle $ sudo touch foo-root a/foo-A b/foo-B
touch: cannot touch `b/foo-B': Read-only file system
kyle@phobos /kyle $ cat /proc/mounts | grep kyle
none /kyle tmpfs rw,relatime 0 0
none /kyle/b tmpfs ro,relatime 0 0

Ok, working up to here.

kyle@phobos /kyle $ sudo strace -e mount mount -o remount,ro a/ b/
mount("/kyle/a", "b/", NULL, MS_MGC_VAL|MS_RDONLY|MS_REMOUNT, NULL) = 0
kyle@phobos /kyle $ sudo touch foo-root a/foo-A b/foo-B
touch: cannot touch `foo-root': Read-only file system
touch: cannot touch `a/foo-A': Read-only file system
touch: cannot touch `b/foo-B': Read-only file system
kyle@phobos /kyle $ cat /proc/mounts | grep kyle
none /kyle tmpfs ro,relatime 0 0
none /kyle/b tmpfs ro,relatime 0 0

Things break with mount -o remount,ro a/ b/

kyle@phobos /kyle $ sudo strace -e mount mount -o remount,rw a/ b/
mount("/kyle/a", "b/", NULL, MS_MGC_VAL|MS_REMOUNT, NULL) = 0
kyle@phobos /kyle $ cat /proc/mounts | grep kyle
none /kyle tmpfs rw,relatime 0 0
none /kyle/b tmpfs rw,relatime 0 0

Things are rw again, let's try making them ro

kyle@phobos /kyle $ sudo strace -e mount mount -o remount,ro b/
mount("/kyle/a", "/kyle/b", 0x7f38874f4900, MS_MGC_VAL|MS_RDONLY|MS_REMOUNT, NULL) = 0
kyle@phobos /kyle $ cat /proc/mounts | grep kyle
none /kyle tmpfs ro,relatime 0 0
none /kyle/b tmpfs ro,relatime 0 0

Wait, now it's propogating it back, even though the command worked before! (Notice that MS_BIND is no longer specified.)

Looks like a util-linux-ng bug in mount to me, and a corner case of the kernel behaviour. :\

Comment 10 Kyle McMartin 2010-04-23 14:53:50 UTC
Created attachment 408635 [details]
robind.c that works

Or-ing in MS_BIND on line 29 makes the test-case succeed. Not sure why util-linux is dropping the bit. (/kyle was a tmpfs freshly created.)

Comment 11 Chuck Ebbert 2010-04-27 13:42:34 UTC
(In reply to comment #9)
> kyle@phobos /kyle $ sudo strace -e mount mount -o remount,ro a/ b/
> mount("/kyle/a", "b/", NULL, MS_MGC_VAL|MS_RDONLY|MS_REMOUNT, NULL) = 0
> kyle@phobos /kyle $ sudo touch foo-root a/foo-A b/foo-B
> touch: cannot touch `foo-root': Read-only file system
> touch: cannot touch `a/foo-A': Read-only file system
> touch: cannot touch `b/foo-B': Read-only file system
> kyle@phobos /kyle $ cat /proc/mounts | grep kyle
> none /kyle tmpfs ro,relatime 0 0
> none /kyle/b tmpfs ro,relatime 0 0
> 
> Things break with mount -o remount,ro a/ b/

If you specify both paths, mtab will not be read and the only options set will be the ones you provide.
 
> kyle@phobos /kyle $ sudo strace -e mount mount -o remount,rw a/ b/
> mount("/kyle/a", "b/", NULL, MS_MGC_VAL|MS_REMOUNT, NULL) = 0
> kyle@phobos /kyle $ cat /proc/mounts | grep kyle
> none /kyle tmpfs rw,relatime 0 0
> none /kyle/b tmpfs rw,relatime 0 0
> 
> Things are rw again, let's try making them ro
>

But now it's not a bind mount anymore.
 
> kyle@phobos /kyle $ sudo strace -e mount mount -o remount,ro b/
> mount("/kyle/a", "/kyle/b", 0x7f38874f4900, MS_MGC_VAL|MS_RDONLY|MS_REMOUNT,
> NULL) = 0
> kyle@phobos /kyle $ cat /proc/mounts | grep kyle
> none /kyle tmpfs ro,relatime 0 0
> none /kyle/b tmpfs ro,relatime 0 0
> 
> Wait, now it's propogating it back, even though the command worked before!
> (Notice that MS_BIND is no longer specified.)
> 
> Looks like a util-linux-ng bug in mount to me, and a corner case of the kernel
> behaviour. :\

Comment 12 Lennart Poettering 2010-04-28 23:50:13 UTC
Anyway, there is a bug somewhere, but I don't think it is the kernel. I.e. either mount needs to be fixed to OR in MS_BIND in this case, or the man page needs to be fixed not to claim that "mount -o remount,ro newdir" would do the job, if it must be that something like "mount --bind -o remount,ro newdir" that makes things work.

Reassigning to util-linux-ng.

(Oh, and I verified that MS_BIND in the remount is indeed the one thing that makes things work. Thanks Kyle, for tracking that down.)

Comment 13 Karel Zak 2010-04-29 20:09:32 UTC
Some notes:

- the same filesystem could be mounted more than once

- from kernel point of view there is not difference between
     mount /dev/sda1 /mnt/a
     mount /dev/sda1 /mnt/b
and
     mount /dev/sda1 /mnt/a
     mount --bind /mnt/a /mnt/b

for kernel the same FS is mounted on two places. The important detail is that
kernel does not maintain information about a way how the mountpoint was created (bind or non-bind) and it does not store the "bind" option to /proc/mounts. So "cat /proc/mounts" does not make sense here.

- MS_REMOUNT|MS_RDONLY  -- updates filesystem superblock, it means the change is
visible on all places in VFS where the filesystem is mounted

- MS_REMOUNT|MS_BIND|MS_RDONLY updates the mount option and the change is visible for the mountpoint only.

- the "bind" option is maintained in /etc/mtab only

mount(8) behaviour:

 a) "mount -o remount,ro /mnt/a"  reads /etc/{mtab,fstab}

 b) "mount -o remount,ro /mnt/b /mnt/a"  does not read fstab/mtab and mtab is updated only.
    
This mount(8) behaviour is documented in the mount.8 man page.

I see only one bug -- in the the man page is not information that the --bind is required for the remount on systems without /etc/mtab or in case that mtab is ignored (because mount source and target are specified). I'll add this info to the man page. 


(In reply to comment #12)
> not to claim that "mount -o remount,ro newdir" would do the job

It does the job if the "bind" options is stored in your mtab:

 # mount /dev/sda6 /mnt/a
 # mount --bind /mnt/a /mnt/b

 # grep -E '/mnt/(a|b)' /proc/mounts
 /dev/sda6 /mnt/a ext4 rw,relatime,barrier=1,data=ordered 0 0
 /dev/sda6 /mnt/b ext4 rw,relatime,barrier=1,data=ordered 0 0

 # mount -o remount,ro /mnt/b   <<<<

 # grep -E '/mnt/(a|b)' /proc/mounts
 /dev/sda6 /mnt/a ext4 rw,relatime,barrier=1,data=ordered 0 0
 /dev/sda6 /mnt/b ext4 ro,relatime,barrier=1,data=ordered 0 0
                       ^^

Comment 14 Karel Zak 2010-06-14 10:49:16 UTC
(In reply to comment #13)
> I see only one bug -- in the the man page is not information that the --bind is
> required for the remount on systems without /etc/mtab or in case that mtab is
> ignored (because mount source and target are specified). I'll add this info to
> the man page. 

 The upstream version of the man page has been updated. It will be available in F-14.