Bug 745781
Summary: | Unable to use indirect mounts | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel Berrangé <berrange> | ||||||||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 16 | CC: | ajayr, dhowells, hobbes1069, ikent, lpoetter, marcus.moeller, sandro | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | autofs-5.0.6-3.fc16 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-11-10 17:35:46 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
> # ls /net/marrow.gsslab.fab.redhat.com/
> ls: cannot access /net/marrow.gsslab.fab.redhat.com/: No such file or directory
Does the same thing happen if you leave out the trailing slash?
> I will attach the full logs shown after setting /etc/sysconfig/autofs
> to 'debug'
Don't forget to check that syslog is actually logging daemon.*
messages.
> Does the same thing happen if you leave out the trailing slash? Yes, no difference > Don't forget to check that syslog is actually logging daemon.* > messages. Setting daemon.* didn't increase the amount of log information, over what I've already attached to this ticket, so I presume that attachment contains everything. (In reply to comment #3) > > Does the same thing happen if you leave out the trailing slash? > > Yes, no difference > > > Don't forget to check that syslog is actually logging daemon.* > > messages. > > Setting daemon.* didn't increase the amount of log information, over what I've > already attached to this ticket, so I presume that attachment contains > everything. OK, thanks, I'll update my kernel source and try and duplicate it. What is the machine that exports these running? Are these supposed to be NFSv4 or v3 mounts. The server is F14. If I mounted it manually I end up with the followig: marrow.gsslab.fab.redhat.com:/var/lib/libvirt/images/ /tmp/f nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.33.8.114,mountvers=3,mountport=35386,mountproto=udp,local_lock=none,addr=10.33.8.114 0 0 (In reply to comment #5) > The server is F14. If I mounted it manually I end up with the followig: > > marrow.gsslab.fab.redhat.com:/var/lib/libvirt/images/ /tmp/f nfs > rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.33.8.114,mountvers=3,mountport=35386,mountproto=udp,local_lock=none,addr=10.33.8.114 > 0 0 This is strange, but it's moving the mount (from the construction area) that fails in autofs, not the mount itself, so being able to mount it isn't surprising. I still need to get the source of this kernel and have a look but I think it has the recent changes that resulted from a spirited debate upstream. I tested a kernel with those changes and I thought this case was covered but maybe not. What's worse is that I hadn't yet returned to finish testing and I'm pretty sure those subsequent tests include examples of the case above. So, let me have a look at the source and run some more tests against it and get back. Ian (In reply to comment #6) > > I still need to get the source of this kernel and have a look but > I think it has the recent changes that resulted from a spirited > debate upstream. I tested a kernel with those changes and I thought > this case was covered but maybe not. What's worse is that I hadn't > yet returned to finish testing and I'm pretty sure those subsequent > tests include examples of the case above. > > So, let me have a look at the source and run some more tests > against it and get back. So far I've tested with a 3.1.0-rc8 that includes the vfs-automount changes that would have been included in 3.1.0-rc9 on an F14 install. The additional test I did was the autofs Connectathon test. It uses a wide range of valid and invalid mount map syntaxes including map entries similar to what was being mounted here. I also exported "/" and a path to another file system and tried to simulate the symptom here mounting each and then moving the root mount. That's not exactly what is used here but is quite close. I haven't seen a problem yet. Next thing to do is test against F16. Ian > The server is F14. If I mounted it manually I end up with the followig:
I should clarify because this is slightly ambiguous. The *NFS* server is F14. The client where I run autofs is F16.
(In reply to comment #7) > > So far I've tested with a 3.1.0-rc8 that includes the vfs-automount > changes that would have been included in 3.1.0-rc9 on an F14 install. > > The additional test I did was the autofs Connectathon test. > It uses a wide range of valid and invalid mount map syntaxes > including map entries similar to what was being mounted here. > > I also exported "/" and a path to another file system and tried > to simulate the symptom here mounting each and then moving the > root mount. That's not exactly what is used here but is quite > close. > > I haven't seen a problem yet. > > Next thing to do is test against F16. And with F16 I see the fail. This isn't dependent on autofs and doesn't appear related to the kernel changes that went into rc8. Every "mount --move" I try fails at the mount(2) call and returns -EINVAL. I can't see why yet. Ian (In reply to comment #9) > > And with F16 I see the fail. > > This isn't dependent on autofs and doesn't appear related to > the kernel changes that went into rc8. Every "mount --move" > I try fails at the mount(2) call and returns -EINVAL. > > I can't see why yet. The root file system in f16 is marked as shared which means that move mount is not permitted anywhere within any filesystems, unless a filesystem is explicitly marked as not shared, since that will be propagated to subordinate mounts. I have no idea why or where this is being done, I just can't find where it happens. I could re-write the code which uses move mount, since we have the new vfs-automount in kernel but that would introduce a restriction on what kernel version can be expected to work reliably under pressure. At the moment I'm stumped as to how to find out why and where this happens, any suggestions of who we should consult? There was an RFE against SystemD to make the root filesystem shared, instead of private. The reason for this is so that mounts automatically propagate into application sandboxes. In my testing if appears you can move from a private filesystem, into a shared filesystem, but not the other direction. So one trick would be to mount a tmpfs directory private, have autofs use that initially, and then move to the real location on the shared root FS. Here's what I did to test this idea # mount --make-shared / # mount -t tmpfs none /tmp/vroot # mount --make-private /tmp/vroot ... setup our two mount points # mkdir /tmp/vroot/a # mkdir /tmp/e ... mount on the private FS originally # mount /dev/loop0 /tmp/vroot/a ...move from private to shared fs: # mount --move /tmp/vroot/a /tmp/e ...see move from shared to private fail as described earlier # mount --move /tmp/e /tmp/vroot/a mount: wrong fs type, bad option, bad superblock on /tmp/e, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so *** Bug 748888 has been marked as a duplicate of this bug. *** (In reply to comment #11) > There was an RFE against SystemD to make the root filesystem shared, instead of > private. The reason for this is so that mounts automatically propagate into > application sandboxes. I'm having difficulty understanding how it is justified to break the mount move system call system wide without any regard to the effect it will have on existing applications. @lpoetter, shouldn't this be fixed in systemd? Even if systemd did not set the filesystem mount mode to 'shared', any system administrator or application can come along at any time and run mount --make-rshared / which will result in autofs ceasing to work. Previously the 'sandbox' application (and others) would include an init script did just this, and other apps would directly call mount() to make / shared. So IMHO you can't really call this a bug in systemd, or require systemd to change / back to private. Both private or shared are perfectly valid modes for the / filesystem on any modern Linux OS, and autofs needs to be made robust for operation under whichever is configured. (In reply to comment #15) > > So IMHO you can't really call this a bug in systemd, or require systemd to > change / back to private. Both private or shared are perfectly valid modes for > the / filesystem on any modern Linux OS, and autofs needs to be made robust for > operation under whichever is configured. Maybe, I agree that we can't call it a "bug", but I don't agree that this is sensible at all. This type of usage wasn't the intent behind the pnode implementation AFAIR. It is worth making autofs tolerant of it though and I'm still thinking about how I should do that. Ian mount --make-rshared / did not really fix the problem. This also breaks delayed mounts in fstab I had this line in f15 UUID=110a6adb-db6f-4ddf-b7d3-6d055676ab1c /mnt/playlist xfs noauto,comment=systemd.automount 1 2 This works fine in F15 but not in F16 (In reply to comment #17) > mount --make-rshared / > > did not really fix the problem. That's right, I think if it isn't private move mount is forbidden. (In reply to comment #18) > This also breaks delayed mounts in fstab > > I had this line in f15 > UUID=110a6adb-db6f-4ddf-b7d3-6d055676ab1c /mnt/playlist xfs > noauto,comment=systemd.automount 1 2 > > > This works fine in F15 but not in F16 What does this have to do with this bug? (In reply to comment #20) > (In reply to comment #18) > > This also breaks delayed mounts in fstab > > > > I had this line in f15 > > UUID=110a6adb-db6f-4ddf-b7d3-6d055676ab1c /mnt/playlist xfs > > noauto,comment=systemd.automount 1 2 > > > > > > This works fine in F15 but not in F16 > > What does this have to do with this bug? as per the docs here https://fedoraproject.org/wiki/User:Johannbg/QA/Systemd/Systemd.mount systemd will use autofs to mount such fstab entries so the autofs failure means the system did not boot till I changes the fstab. Thanks (In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #18) > > > This also breaks delayed mounts in fstab > > > > > > I had this line in f15 > > > UUID=110a6adb-db6f-4ddf-b7d3-6d055676ab1c /mnt/playlist xfs > > > noauto,comment=systemd.automount 1 2 > > > > > > > > > This works fine in F15 but not in F16 > > > > What does this have to do with this bug? > > as per the docs here > https://fedoraproject.org/wiki/User:Johannbg/QA/Systemd/Systemd.mount > > systemd will use autofs to mount such fstab entries so the autofs failure means > the system did not boot till I changes the fstab. I still don't know what this has to do with the move mount option restriction to mount(2) which is the problem being discussed here. I've been thinking about this and I feel that the move mount isn't really needed. Certainly with current kernels that include the new vfs-automount it shouldn't be needed. But also there have been some recent bug fixes for possibly related issues that I wasn't aware of at the time I added it. Also, move mount was only ever needed for a small, not widely used subset of configurations as well. So I'm adding a configure option to disable the use of move mount and, while it won't be the default, it will be set in the spec file in the autofs distribution tar and in the spec file for Fedora. Ian Created attachment 531489 [details]
Patch - fix fix map source check in file lookup
Created attachment 531490 [details]
Patch - add disable move mount configure option
These two patches appear to resolve the problem with autofs. Note that I'm not saying anything about autofs in the kernel (actually called autofs4) since this bug is about the user space automount daemon which is included in the autofs package. Also, systemd doesn't use user space autofs at all so there may be some misunderstanding by the look of some of the comments above. For those that do understand this distinction please test the scratch build found here, which includes the two patches above: https://koji.fedoraproject.org/koji/taskinfo?taskID=3482097 The 'move_mount: failed to move mount' error is gone, but the shares are still not accessible. This is the only message that is logged: rpc_get_exports_proto The same result as if I do an: mount --make-rprivate / (In reply to comment #27) > The 'move_mount: failed to move mount' error is gone, but the shares are still > not accessible. > > This is the only message that is logged: > > rpc_get_exports_proto > > The same result as if I do an: > > mount --make-rprivate / You'll need to provide more information because I don't see a problem with getting the exports list from a server here. What does the exports list from the server look like? What OS is the server running? What NFS version, v3 or v4? Provide a full debug log, set LOGGING="debug" in the autofs configuration and ensure that daemon debug messages are being logged to syslog, which can be done by sending daemon.* to a log file in rsyslog.conf. (In reply to comment #15) > Even if systemd did not set the filesystem mount mode to 'shared', any system > administrator or application can come along at any time and run BTW, we have discussed the default propagation mode problem with a couple of folks and always came to the same conclusion: we want the propagation mode to be a mount option like any other, so that it would be applied atomically to all mounts as they are created and it can be listed in /etc/fstab. Not sure when we'll get this from the kernel folks, but given this perspective we decided not do this at all in systemd. And yupp, systemd does not use the autofs package, so the bug definitely has no relation to systemd. (In reply to comment #29) > > And yupp, systemd does not use the autofs package, so the bug definitely has no > relation to systemd. I'm struggling to call this a bug. It's a choice that's been made to provide certain functionality that has side effects and it happens to affect autofs. Changing autofs isn't really such a big deal, especially when running against kernel versions where we will be likely to find systemd running (famous last words), and that's what I'm doing. ;) (In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #20) > > > (In reply to comment #18) > > > > This also breaks delayed mounts in fstab > > > > > > > > I had this line in f15 > > > > UUID=110a6adb-db6f-4ddf-b7d3-6d055676ab1c /mnt/playlist xfs > > > > noauto,comment=systemd.automount 1 2 > > > > > > > > > > > > This works fine in F15 but not in F16 > > > > > > What does this have to do with this bug? > > > > as per the docs here > > https://fedoraproject.org/wiki/User:Johannbg/QA/Systemd/Systemd.mount > > > > systemd will use autofs to mount such fstab entries so the autofs failure means > > the system did not boot till I changes the fstab. > > I still don't know what this has to do with the move mount option > restriction to mount(2) which is the problem being discussed here. You are correct. I reinstalled (not upgrade) F-16 Gold and the fstab works fine (In reply to comment #26) > These two patches appear to resolve the problem with autofs. > > Note that I'm not saying anything about autofs in the kernel > (actually called autofs4) since this bug is about the user > space automount daemon which is included in the autofs > package. Also, systemd doesn't use user space autofs at all > so there may be some misunderstanding by the look of some > of the comments above. > > For those that do understand this distinction please test > the scratch build found here, which includes the two patches > above: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=3482097 This works fine on a fresh install of F-16 here. re-checked. Seems to work here, too. Thanks for the great work Ian. autofs-5.0.6-3.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/autofs-5.0.6-3.fc16 (In reply to comment #34) > autofs-5.0.6-3.fc16 has been submitted as an update for Fedora 16. > https://admin.fedoraproject.org/updates/autofs-5.0.6-3.fc16 Not sure when this will show up in testing due to being in the release phase but when it does please give it a try. Created attachment 532208 [details]
Proposed F16 update
This package includes a number of fixes and is worth building
it locally and testing it (while waiting for the build system
build to reach the testing repo) as it is what I'm proposing
as an update for F16.
*** Bug 751766 has been marked as a duplicate of this bug. *** autofs-5.0.6-3.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 527962 [details] autofs debug log of failure Description of problem: My /etc/auto.master contains: /net -hosts Attempting to activate an indirect mount fails though: # ls /net/marrow.gsslab.fab.redhat.com/ ls: cannot access /net/marrow.gsslab.fab.redhat.com/: No such file or directory The host is accessible, and can be succesfully mounted manually. Autofs syslog says: Oct 13 13:02:39 dhcp-188 automount[1200]: move_mount: failed to move mount from /tmp/autoRGerI0 to /net/marrow.gsslab.fab.redhat.com: No such file or directory And /net seems unhappy # ls -al /net ls: cannot access /net/marrow.gsslab.fab.redhat.com: No such file or directory total 4 drwxr-xr-x. 3 root root 0 Oct 13 13:06 . dr-xr-xr-x. 23 root root 4096 Oct 13 13:05 .. d?????????? ? ? ? ? ? marrow.gsslab.fab.redhat.com I will attach the full logs shown after setting /etc/sysconfig/autofs to 'debug' SELinux is in permissive mode. Version-Release number of selected component (if applicable): autofs-5.0.6-2.fc16.x86_64 kernel-3.1.0-0.rc9.git0.0.fc16.x86_64 How reproducible: Seems to be consistent across reboots Steps to Reproduce: 1. Setup /net as an indirect mount 2. Attempt to mount some servers 3. Actual results: Expected results: Additional info: