1039627 – Automounter does not unmount Kerberized NFS mounts

Bug 1039627 - Automounter does not unmount Kerberized NFS mounts

Summary: Automounter does not unmount Kerberized NFS mounts

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	autofs
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-12-09 16:14 UTC by Ondrej
Modified:	2015-06-30 00:46 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-06-30 00:46:00 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ondrej 2013-12-09 16:14:00 UTC

Description of problem:

I have 2 autofs maps (delivered via sssd). One classic NFS map, second for kerberized NFSv4 mount (defined using "-sec=krb5" parameter).
After the certawin period of inactivity (10 minutes), the first mount gets successfully unmounted, but the second mount is still mounted - and no process is using the filesystem as I can unmount it fine manually.

Version-Release number of selected component (if applicable):
System fully update (Fedora 19 x86_64)

How reproducible:
always

Steps to Reproduce:
1. Define kerberized NFS map, use it.
2. wait 20 minutes
3. It should get unmounted, but it does not

Actual results:
Share is still mounted

Expected results:
Share gets unmounted after pre-defined inactivity interval

Additional info:

Comment 1 Fedora End Of Life 2015-01-09 20:49:22 UTC

This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Jason Tibbitts 2015-02-11 21:30:33 UTC

Just a note that I have switched to fully kerberized NFS and I'm not seeing any problems with autofs unmounting kerberized mounts.

Comment 3 Ondrej 2015-02-12 12:14:46 UTC

Strange, I can still reproduce this on Fedora 20, autofs-5.0.7-41.fc20.x86_64.
NFS server is Netapp.
automounter unmounts system NFS shares fine, kerberized never.
Will enable debug

Comment 4 Ian Kent 2015-02-13 11:04:42 UTC

(In reply to Ondrej from comment #3)
> Strange, I can still reproduce this on Fedora 20,
> autofs-5.0.7-41.fc20.x86_64.
> NFS server is Netapp.
> automounter unmounts system NFS shares fine, kerberized never.
> Will enable debug

A debug log is an essential starting point. We'll want to
see if umount is returning an error on stdout for a start
and if autofs is even trying the umount at all.

Problem for autofs is that it doesn't know (and probably
shouldn't need to know, but perhaps that's not entirely
true) the effect of mount options, it just mounts using
user supplied options and umounts after some amount of
time when there has been no access. That access can come
from anywhere and often causes things to stay mounted after
people expect them to be umounted.

If autofs isn't trying to umount after some time when you
think it should be then that needs investigation. But if
a umount is being tried and failing then we probably need
to look elsewhere for the problem.

Comment 5 Ondrej 2015-02-13 12:03:08 UTC

Ok, here we are:

1. cd /auto/private; cd /
# attempt to mount Kerberos NFS share

# journalctl _SYSTEMD_UNIT=autofs.service

úno 12 13:22:23 dedek automount[32570]: attempting to mount entry /auto/private
úno 12 13:22:23 dedek automount[32570]: lookup_name_file_source_instance: file map not found
úno 12 13:22:23 dedek automount[32570]: lookup_mount: lookup(sss): looking up private
úno 12 13:22:23 dedek automount[32570]: lookup_mount: lookup(sss): private -> -sec=krb5 czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): expanded entry: -sec=krb5 czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): gathered options: sec=krb5
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): dequote("czshare.vendavo.com:/vol/Private") -> czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: parse_mount: parse(sun): core of entry: options=sec=krb5, loc=czshare.vendavo.com:/vol/Private
úno 12 13:22:23 dedek automount[32570]: sun_mount: parse(sun): mounting root /auto, mountpoint private, what czshare.vendavo.com:/vol/Private, fstype 
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): root=/auto name=private what=czshare.vendavo.com:/vol/Private, fstype=nfs, options=se
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): nfs options="sec=krb5", nobind=0, nosymlink=0, ro=0
úno 12 13:22:23 dedek automount[32570]: get_nfs_info: called with host czshare.vendavo.com(10.103.4.21) proto 6 version 0x40
úno 12 13:22:23 dedek automount[32570]: get_nfs_info: nfs v4 rpc ping time: 0.001808
úno 12 13:22:23 dedek automount[32570]: get_nfs_info: host czshare.vendavo.com cost 1808 weight 0
úno 12 13:22:23 dedek automount[32570]: prune_host_list: selected subset of hosts that support NFS4 over TCP
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): calling mkdir_path /auto/private
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): calling mount -t nfs -s -o sec=krb5 czshare.vendavo.com:/vol/Private /auto/private
úno 12 13:22:23 dedek automount[32570]: spawn_mount: mtab link detected, passing -n to mount
úno 12 13:22:23 dedek automount[32570]: mount_mount: mount(nfs): mounted czshare.vendavo.com:/vol/Private on /auto/private
úno 12 13:22:23 dedek automount[32570]: dev_ioctl_send_ready: token = 25
úno 12 13:22:23 dedek automount[32570]: mounted /auto/private

2. [root@dedek ~]# mount | grep krb
czshare.vendavo.com:/vol/Private on /auto/private type nfs4 (rw,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.102.11.22,local_lock=none,addr=10.103.4.21)

-- we are mounted just fine

3. Wait a hour here, but still no luck. Debug log shows nthing at all:

úno 13 12:09:02 dedek automount[32570]: st_expire: state 1 path /auto
úno 13 12:09:02 dedek automount[32570]: expire_proc: exp_proc = 140218690578176 path /auto
úno 13 12:09:02 dedek automount[32570]: expire_proc_indirect: expire /auto/private
úno 13 12:09:02 dedek automount[32570]: 1 remaining in /auto
úno 13 12:09:02 dedek automount[32570]: expire_cleanup: got thid 140218690578176 path /auto stat 3
úno 13 12:09:02 dedek automount[32570]: expire_cleanup: sigchld: exp 140218690578176 finished, switching from 2 to 1
úno 13 12:09:02 dedek automount[32570]: st_ready: st_ready(): state = 2 path /auto

4. Let's try to unmount manually:
[root@dedek ~]# umount /auto/private
[root@dedek ~]# 
-- no problem, share unmounted


So to me it looks like autofs is trying to unmount, but I have no idea why it does not succeed.

Comment 6 Ondrej 2015-02-13 12:20:44 UTC

Or perhaps it is not trying to unmount (no clear from the logs) - in which case I have no idea why. Can we somehow backtrack why autofs assumes share is still needed?

Comment 7 Ian Kent 2015-02-14 00:10:24 UTC

(In reply to Ondrej from comment #6)
> Or perhaps it is not trying to unmount (no clear from the logs) - in which
> case I have no idea why. Can we somehow backtrack why autofs assumes share
> is still needed?

If autofs was trying to umount the mount it should be fairly
obvious from the log. There would be log entries containing
"umount" in the log entry.

The most common reason for autofs not trying to umount something
is some process periodically accessing the mount or somewhere
within it. System wellness checking utilities are frequently
culprits.

The assumption is that if something (anything) accesses the
mount before the expire timeout then it's likely to just get
mounted again if it is umounted.

The other thing that can happen is utilities that monitor
mounts and umounts. They have a characteristic log signature
where the mount gets mounted, it expires and is immediately
mounted again due to the monitoring applications checking.
They also appear to never timeout. But it sounds like your
seeing the former behaviour.

Ian

Comment 8 Ondrej 2015-02-15 17:33:48 UTC

I can understand that, but why only Kerberized mounts are affected?
/auto/private (kerberized) never get unmounted
/auto/proj (system sec) unmount happily after 10 minutes

How does automounter detect that the mount was not used for a long time? Can I get this information myself from the cmd line (using sysctl, /proc or /sys)?

Comment 9 Ian Kent 2015-02-16 01:10:23 UTC

(In reply to Ondrej from comment #8)
> I can understand that, but why only Kerberized mounts are affected?
> /auto/private (kerberized) never get unmounted
> /auto/proj (system sec) unmount happily after 10 minutes
> 
> How does automounter detect that the mount was not used for a long time? Can
> I get this information myself from the cmd line (using sysctl, /proc or
> /sys)?

Actually, what I said above isn't correct.

The mounts timeout after they are considered not in use any more.

And "not in use any more" means when there are no open files or
working directories in use within the mount. There can be other
things in the kernel which increase the reference count like
when a mount is propagated to another namespace. The namespace
example is the only other case I've seen that can prevent expiry.

The last used time is stored in an autofs private data structure
belonging to the autofs fs kernel dentry and isn't viewable but
then neither is the dentry structure.

Checking for open files using lsof can be useful but, depending on
the autofs mount type, it might not be clear if an open file owned
by autofs should or shouldn't be present.

It's not really possible to view the vfs mount reference count either.

Ian

Comment 10 Ondrej 2015-02-16 08:43:02 UTC

I can unmount the share manually anytime -> I guess that means there are no open files, right? If there were open files, I would not be able to do so.
So lsof is meaningless here.

So you say it is actually not possible to determine the reason for this behavior. That's pity. It is not too big problem for me right now, but it could be as it means any change to the automounter maps won't be active until restart of the automounter.

Comment 11 Ian Kent 2015-02-16 10:58:36 UTC

(In reply to Ondrej from comment #10)
> I can unmount the share manually anytime -> I guess that means there are no
> open files, right? If there were open files, I would not be able to do so.
> So lsof is meaningless here.

Yes, that's about it.

But if any namespace has cloned an autofs mount and the reference
count has been increased that way then autofs won't expire it but
umount will probably still be able to umount it. It's not clear
to me quite how that works but it has been seen to be a problem
in the past.

> 
> So you say it is actually not possible to determine the reason for this
> behavior. That's pity. It is not too big problem for me right now, but it
> could be as it means any change to the automounter maps won't be active
> until restart of the automounter.

I'm saying it isn't possible for you to see the last_used value
from user space or the mount reference count for that matter.

I am also saying I don't know why it's happening.

If we want to go further with this then you would need to apply
patches to the kernel, build and run the patched kernel to get
log output. That hasn't worked well for me in the past and
shouldn't be needed.

I'll need to check but I think most of the existing debug logging
prints can be enabled from a machine root account on the fly but
I'm pretty sure all we'll get from that is what we already know,
the mount point dentry isn't being selected for expire because
the autofs expire system thinks it in use.

If there's no namespace usage then there's no known reason for
mounts to not expire so I'm questioning whether it's actually
autofs that's at fault but I also don't have anything else to
offer. The fact that you can umount it makes me think there's
some namespace usage somewhere on the system that doesn't take
account of autofs mounts, unshare(1) comes to mind.

AFAIK there's not been any changes to the autofs kernel module
for some time either so I am puzzled.

Ian

Comment 12 Ian Kent 2015-02-16 11:14:25 UTC

I remember quite some time ago, about the time systemd
changed to setting the root filesystem as shared, that I
started seeing mounts not expire and I tracked it to an
elevated mount reference count.

Remounting the root filesystem private would make the
problem go away. So it was the mount propagation that
was causing it (aka. propagation to other namespaces)
even though I wasn't using namespaces as far as I knew.

I spent a long time trying to work out how it was happening
but before I worked it out the problem went away.

I'm not certain now if I used:
mount --make-private /
or the recursive form:
mount --make-rprivate /
for this.

As an experiment you might want to stop autofs and ensure
everything autofs is umounted, including all autofs mounts
themselves, and try "mount --make-private /" before starting
autofs and see if that makes a difference to the expire.

Comment 13 Fedora End Of Life 2015-02-17 19:35:32 UTC

Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 14 Roland Mainz 2015-02-19 22:24:30 UTC

Is this issue still present in F22 ?

Comment 15 Fedora End Of Life 2015-05-29 09:56:59 UTC

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2015-06-30 00:46:00 UTC

Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.