| Summary: | Automounter does not unmount Kerberized NFS mounts | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Ondrej <ovalousek> |
| Component: | autofs | Assignee: | Ian Kent <ikent> |
| Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 20 | CC: | ikent, ovalousek, rmainz, steved |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-06-30 00:46:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Ondrej
2013-12-09 16:14:00 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 19. It is Fedora's policy to close all bug reports from releases that are no longer maintained. Approximately 4 (four) weeks from now this bug will be closed as EOL if it remains open with a Fedora 'version' of '19'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 19 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Just a note that I have switched to fully kerberized NFS and I'm not seeing any problems with autofs unmounting kerberized mounts. Strange, I can still reproduce this on Fedora 20, autofs-5.0.7-41.fc20.x86_64. NFS server is Netapp. automounter unmounts system NFS shares fine, kerberized never. Will enable debug (In reply to Ondrej from comment #3) > Strange, I can still reproduce this on Fedora 20, > autofs-5.0.7-41.fc20.x86_64. > NFS server is Netapp. > automounter unmounts system NFS shares fine, kerberized never. > Will enable debug A debug log is an essential starting point. We'll want to see if umount is returning an error on stdout for a start and if autofs is even trying the umount at all. Problem for autofs is that it doesn't know (and probably shouldn't need to know, but perhaps that's not entirely true) the effect of mount options, it just mounts using user supplied options and umounts after some amount of time when there has been no access. That access can come from anywhere and often causes things to stay mounted after people expect them to be umounted. If autofs isn't trying to umount after some time when you think it should be then that needs investigation. But if a umount is being tried and failing then we probably need to look elsewhere for the problem. Ok, here we are:
1. cd /auto/private; cd /
# attempt to mount Kerberos NFS share
# journalctl _SYSTEMD_UNIT=autofs.service
úno 12 13:22:23 dedek automount[32570]: attempting to mount entry /auto/private
úno 12 13:22:23 dedek automount[32570]: lookup_name_file_source_instance: file map not found
úno 12 13:22:23 dedek automount[32570]: lookup_mount: lookup(sss): looking up private
úno 12 13:22:23 dedek automount[32570]: lookup_mount: lookup(sss): private -> -sec=krb5 czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): expanded entry: -sec=krb5 czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): gathered options: sec=krb5
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): dequote("czshare.vendavo.com:/vol/Private") -> czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): core of entry: options=sec=krb5, loc=czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: sun_mount: parse(sun): mounting root /auto, mountpoint private, what czshare.vendavo.com:/vol/Private, fstype
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): root=/auto name=private what=czshare.vendavo.com:/vol/Private, fstype=nfs, options=se
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): nfs options="sec=krb5", nobind=0, nosymlink=0, ro=0
úno 12 13:22:23 dedek automount[32570]: get_nfs_info: called with host czshare.vendavo.com(10.103.4.21) proto 6 version 0x40
úno 12 13:22:23 dedek automount[32570]: get_nfs_info: nfs v4 rpc ping time: 0.001808
úno 12 13:22:23 dedek automount[32570]: get_nfs_info: host czshare.vendavo.com cost 1808 weight 0
úno 12 13:22:23 dedek automount[32570]: prune_host_list: selected subset of hosts that support NFS4 over TCP
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): calling mkdir_path /auto/private
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): calling mount -t nfs -s -o sec=krb5 czshare.vendavo.com:/vol/Private /auto/private
úno 12 13:22:23 dedek automount[32570]: spawn_mount: mtab link detected, passing -n to mount
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): mounted czshare.vendavo.com:/vol/Private on /auto/private
úno 12 13:22:23 dedek automount[32570]: dev_ioctl_send_ready: token = 25
úno 12 13:22:23 dedek automount[32570]: mounted /auto/private
2. [root@dedek ~]# mount | grep krb
czshare.vendavo.com:/vol/Private on /auto/private type nfs4 (rw,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.102.11.22,local_lock=none,addr=10.103.4.21)
-- we are mounted just fine
3. Wait a hour here, but still no luck. Debug log shows nthing at all:
úno 13 12:09:02 dedek automount[32570]: st_expire: state 1 path /auto
úno 13 12:09:02 dedek automount[32570]: expire_proc: exp_proc = 140218690578176 path /auto
úno 13 12:09:02 dedek automount[32570]: expire_proc_indirect: expire /auto/private
úno 13 12:09:02 dedek automount[32570]: 1 remaining in /auto
úno 13 12:09:02 dedek automount[32570]: expire_cleanup: got thid 140218690578176 path /auto stat 3
úno 13 12:09:02 dedek automount[32570]: expire_cleanup: sigchld: exp 140218690578176 finished, switching from 2 to 1
úno 13 12:09:02 dedek automount[32570]: st_ready: st_ready(): state = 2 path /auto
4. Let's try to unmount manually:
[root@dedek ~]# umount /auto/private
[root@dedek ~]#
-- no problem, share unmounted
So to me it looks like autofs is trying to unmount, but I have no idea why it does not succeed.
Or perhaps it is not trying to unmount (no clear from the logs) - in which case I have no idea why. Can we somehow backtrack why autofs assumes share is still needed? (In reply to Ondrej from comment #6) > Or perhaps it is not trying to unmount (no clear from the logs) - in which > case I have no idea why. Can we somehow backtrack why autofs assumes share > is still needed? If autofs was trying to umount the mount it should be fairly obvious from the log. There would be log entries containing "umount" in the log entry. The most common reason for autofs not trying to umount something is some process periodically accessing the mount or somewhere within it. System wellness checking utilities are frequently culprits. The assumption is that if something (anything) accesses the mount before the expire timeout then it's likely to just get mounted again if it is umounted. The other thing that can happen is utilities that monitor mounts and umounts. They have a characteristic log signature where the mount gets mounted, it expires and is immediately mounted again due to the monitoring applications checking. They also appear to never timeout. But it sounds like your seeing the former behaviour. Ian I can understand that, but why only Kerberized mounts are affected? /auto/private (kerberized) never get unmounted /auto/proj (system sec) unmount happily after 10 minutes How does automounter detect that the mount was not used for a long time? Can I get this information myself from the cmd line (using sysctl, /proc or /sys)? (In reply to Ondrej from comment #8) > I can understand that, but why only Kerberized mounts are affected? > /auto/private (kerberized) never get unmounted > /auto/proj (system sec) unmount happily after 10 minutes > > How does automounter detect that the mount was not used for a long time? Can > I get this information myself from the cmd line (using sysctl, /proc or > /sys)? Actually, what I said above isn't correct. The mounts timeout after they are considered not in use any more. And "not in use any more" means when there are no open files or working directories in use within the mount. There can be other things in the kernel which increase the reference count like when a mount is propagated to another namespace. The namespace example is the only other case I've seen that can prevent expiry. The last used time is stored in an autofs private data structure belonging to the autofs fs kernel dentry and isn't viewable but then neither is the dentry structure. Checking for open files using lsof can be useful but, depending on the autofs mount type, it might not be clear if an open file owned by autofs should or shouldn't be present. It's not really possible to view the vfs mount reference count either. Ian I can unmount the share manually anytime -> I guess that means there are no open files, right? If there were open files, I would not be able to do so. So lsof is meaningless here. So you say it is actually not possible to determine the reason for this behavior. That's pity. It is not too big problem for me right now, but it could be as it means any change to the automounter maps won't be active until restart of the automounter. (In reply to Ondrej from comment #10) > I can unmount the share manually anytime -> I guess that means there are no > open files, right? If there were open files, I would not be able to do so. > So lsof is meaningless here. Yes, that's about it. But if any namespace has cloned an autofs mount and the reference count has been increased that way then autofs won't expire it but umount will probably still be able to umount it. It's not clear to me quite how that works but it has been seen to be a problem in the past. > > So you say it is actually not possible to determine the reason for this > behavior. That's pity. It is not too big problem for me right now, but it > could be as it means any change to the automounter maps won't be active > until restart of the automounter. I'm saying it isn't possible for you to see the last_used value from user space or the mount reference count for that matter. I am also saying I don't know why it's happening. If we want to go further with this then you would need to apply patches to the kernel, build and run the patched kernel to get log output. That hasn't worked well for me in the past and shouldn't be needed. I'll need to check but I think most of the existing debug logging prints can be enabled from a machine root account on the fly but I'm pretty sure all we'll get from that is what we already know, the mount point dentry isn't being selected for expire because the autofs expire system thinks it in use. If there's no namespace usage then there's no known reason for mounts to not expire so I'm questioning whether it's actually autofs that's at fault but I also don't have anything else to offer. The fact that you can umount it makes me think there's some namespace usage somewhere on the system that doesn't take account of autofs mounts, unshare(1) comes to mind. AFAIK there's not been any changes to the autofs kernel module for some time either so I am puzzled. Ian I remember quite some time ago, about the time systemd changed to setting the root filesystem as shared, that I started seeing mounts not expire and I tracked it to an elevated mount reference count. Remounting the root filesystem private would make the problem go away. So it was the mount propagation that was causing it (aka. propagation to other namespaces) even though I wasn't using namespaces as far as I knew. I spent a long time trying to work out how it was happening but before I worked it out the problem went away. I'm not certain now if I used: mount --make-private / or the recursive form: mount --make-rprivate / for this. As an experiment you might want to stop autofs and ensure everything autofs is umounted, including all autofs mounts themselves, and try "mount --make-private /" before starting autofs and see if that makes a difference to the expire. Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Is this issue still present in F22 ? This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |