Bug 804734
Summary: | Home directory mount is getting renamed to "/home/user (deleted)" | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Orion Poplawski <orion> | ||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 16 | CC: | bfields, ikent, jlayton, steved | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-13 14:50:29 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Orion Poplawski
2012-03-19 16:55:26 UTC
(In reply to comment #0) > Description of problem: > > At some point, the home directory mount for one of our user's got renamed to > add " (deleted)": > > earth:/export/home/ruth /home/ruth\040(deleted) nfs4 This looks like an NFS dentry (not hard to work that out). The VFS changes the dentry name to add " (deleted)" when it encounters an unlinked but still in use dentry at the end of a path when calculating the path from the dentry to the root. It should not be reflected in the underlying file system NFS or otherwise. Lets look at the automount debug log to start with. Apr 3 10:00:37 amakihi automount[1333]: handle_packet: type = 3 Apr 3 10:00:37 amakihi automount[1333]: handle_packet_missing_indirect: token 3, name sumsmgr, request pid 1444 Apr 3 10:00:37 amakihi automount[1333]: attempting to mount entry /home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: lookup_name_file_source_instance: file map not found Apr 3 10:00:37 amakihi automount[1333]: lookup_mount: lookup(ldap): looking up sumsmgr Apr 3 10:00:37 amakihi automount[1333]: do_bind: lookup(ldap): auth_required: 1, sasl_mech (null) Apr 3 10:00:37 amakihi automount[1333]: do_bind: lookup(ldap): ldap simple bind returned 0 Apr 3 10:00:37 amakihi automount[1333]: lookup_one: lookup(ldap): searching for "(&(objectclass=automount)(|(cn=sumsmgr)(cn=/)(cn=\2A)))" under "ou=auto.home,dc=cora,dc=nwra,dc=com" Apr 3 10:00:37 amakihi automount[1333]: lookup_one: lookup(ldap): getting first entry for cn="sumsmgr" Apr 3 10:00:37 amakihi automount[1333]: lookup_one: lookup(ldap): examining first entry Apr 3 10:00:37 amakihi automount[1333]: lookup_mount: lookup(ldap): sumsmgr -> earth.cora.nwra.com:/export/home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): expanded entry: earth.cora.nwra.com:/export/home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): gathered options: fstype=nfs4,sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime Apr 3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): dequote("earth.cora.nwra.com:/export/home/sumsmgr") -> earth.cora.nwra.com:/export/home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): core of entry: options=fstype=nfs4,sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime, loc=earth.cora.nwra.com:/export/home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: sun_mount: parse(sun): mounting root /home, mountpoint sumsmgr, what earth.cora.nwra.com:/export/home/sumsmgr, fstype nfs4, options sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime Apr 3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): root=/home name=sumsmgr what=earth.cora.nwra.com:/export/home/sumsmgr, fstype=nfs4, options=sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime Apr 3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): nfs options="sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime", nobind=0, nosymlink=0, ro=0 Apr 3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): calling mkdir_path /home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): calling mount -t nfs4 -s -o sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime earth.cora.nwra.com:/export/home/sumsmgr /home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: spawn_mount: mtab link detected, passing -n to mount Apr 3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): mounted earth.cora.nwra.com:/export/home/sumsmgr on /home/sumsmgr Apr 3 10:00:37 amakihi automount[1333]: dev_ioctl_send_ready: token = 3 Apr 3 10:00:37 amakihi automount[1333]: mounted /home/sumsmgr Then a little later: Apr 3 10:01:19 amakihi automount[1333]: handle_packet: type = 3 Apr 3 10:01:19 amakihi automount[1333]: handle_packet_missing_indirect: token 4, name sumsmgr (deleted), request pid 2017 Apr 3 10:01:19 amakihi automount[1333]: attempting to mount entry /home/sumsmgr (deleted) Apr 3 10:01:19 amakihi automount[1333]: lookup_name_file_source_instance: file map not found Apr 3 10:01:19 amakihi automount[1333]: lookup_mount: lookup(ldap): looking up sumsmgr (deleted) Apr 3 10:01:19 amakihi automount[1333]: do_bind: lookup(ldap): auth_required: 1, sasl_mech (null) Apr 3 10:01:19 amakihi automount[1333]: do_bind: lookup(ldap): ldap simple bind returned 0 Apr 3 10:01:19 amakihi automount[1333]: lookup_one: lookup(ldap): searching for "(&(objectclass=automount)(|(cn=sumsmgr (deleted))(cn=/)(cn=\2A)))" under "ou=auto.home,dc=cora,dc=nwra,dc=com" Apr 3 10:01:19 amakihi automount[1333]: lookup_one: lookup(ldap): query failed for (&(objectclass=automount)(|(cn=sumsmgr (deleted))(cn=/)(cn=\2A))) Apr 3 10:01:19 amakihi automount[1333]: key "sumsmgr (deleted)" not found in map source(s). Apr 3 10:01:19 amakihi automount[1333]: dev_ioctl_send_fail: token = 4 Apr 3 10:01:19 amakihi automount[1333]: failed to mount /home/sumsmgr (deleted) I guess this isn't autofs's fault, but I really have no idea what is going one here. Re-assign to kernel/nfs folks? Here's a possibly related trigger - the system was coming up in standard time and then the clock bumped 1hr by ntpdate after automount has started, but before anything was mounted: Apr 3 09:08:20 amakihi automount[1310]: st_ready: st_ready(): state = 0 path /data4 Apr 3 09:08:20 amakihi automount[1310]: ghosting enabled Apr 3 09:08:21 amakihi autofs[1299]: Starting automount: [ OK ] Apr 3 09:08:21 amakihi dbus-daemon[925]: [ OK ] Apr 3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /nfs Apr 3 10:08:25 amakihi ntpdate[1322]: step time server 4.28.99.180 offset 3600.319945 sec Apr 3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc = 140609222506240 path /nfs Apr 3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /home Apr 3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc = 140609112016640 path /home This happened a couple times in a row. I then fixed the hw clock and have not yet seen the problem. (In reply to comment #3) > Here's a possibly related trigger - the system was coming up in standard time > and then the clock bumped 1hr by ntpdate after automount has started, but > before anything was mounted: > > Apr 3 09:08:20 amakihi automount[1310]: st_ready: st_ready(): state = 0 path > /data4 > Apr 3 09:08:20 amakihi automount[1310]: ghosting enabled > Apr 3 09:08:21 amakihi autofs[1299]: Starting automount: [ OK ] > Apr 3 09:08:21 amakihi dbus-daemon[925]: [ OK ] > Apr 3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /nfs > Apr 3 10:08:25 amakihi ntpdate[1322]: step time server 4.28.99.180 offset > 3600.319945 sec > Apr 3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc = > 140609222506240 path /nfs > Apr 3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /home > Apr 3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc = > 140609112016640 path /home > > This happened a couple times in a row. I then fixed the hw clock and have not > yet seen the problem. Right, NFS does a "lot" with times so that does sound like it might be the source of the problem. But I thought that when using NTP the hardware clock was updated from the system time at some point (in the shutdown I think) so you shouldn't be having such a large discrepancy. You could verify if this is the cause by updating the machine hardware clock from the system time time manually (hwclock --systohc). See if the machine retains the setting, perhaps the CMOS battery is flat, but if it does retain the time and the change is much smaller you can see if the problem persists. It may be the case that NTP is refusing to update the hardware clock because of the size of the time difference. This is a systemd change. If /etc/adjtime indicates that the BIOS clock is set to LOCAL time it does not update the BIOS clock. If it is set to UTC (hwclock --utc --systohc) it does. This machine was set to LOCAL. So, do you think there is still a bug to track down in NFS or is this to be expected? What about when the clock jumps during the change? (In reply to comment #5) > This is a systemd change. If /etc/adjtime indicates that the BIOS clock is set > to LOCAL time it does not update the BIOS clock. If it is set to UTC (hwclock > --utc --systohc) it does. This machine was set to LOCAL. > > So, do you think there is still a bug to track down in NFS or is this to be > expected? What about when the clock jumps during the change? Not sure but I've always fixed system time problems before anything else when I've had NFS oddness and always made sure there were no significant changes being made to the system time of NFS client systems. The NFS client uses times a lot for cache coherency so IMHO the system time must be stable and consistent between all network hosts that use or provide NFS services. Perhaps others added to the cc list can give some more qualified information? "The NFS client uses times a lot for cache coherency so IMHO the system time must be stable and consistent between all network hosts that use or provide NFS services." When deciding whether its cache is up to date, the client may compare two server-generated timestamps (e.g., "is the last modified time of this file different from when last I looked at it?"), but it should never compare its own time with the server's. So synchronized clocks should not be necessary for NFS cache consistency. It shouldn't be possible to unlink a directory that has something mounted on it. Is it possible the automounter decided to expire the directory (if so, would it perform an unlink?), and that expiry raced with the mount process somehow? (In reply to comment #7) > > So synchronized clocks should not be necessary for NFS cache consistency. > > It shouldn't be possible to unlink a directory that has something mounted on > it. Is it possible the automounter decided to expire the directory (if so, > would it perform an unlink?), and that expiry raced with the mount process > somehow? Mount to expire often race one another but that has been heavily worked on over a long time so possibly not an autofs problem. Also the log fragments show the automount entry uses the "browse" option (Orion, is that correct for the map here?) so autofs won't remove the mount point directory after expire. Orion, how about posting the entire log so I can see what autofs is doing. (In reply to comment #8) > (In reply to comment #7) > > > > So synchronized clocks should not be necessary for NFS cache consistency. > > > > It shouldn't be possible to unlink a directory that has something mounted on > > it. Is it possible the automounter decided to expire the directory (if so, > > would it perform an unlink?), and that expiry raced with the mount process > > somehow? > > Mount to expire often race one another but that has been heavily > worked on over a long time so possibly not an autofs problem. > > Also the log fragments show the automount entry uses the "browse" > option (Orion, is that correct for the map here?) so autofs won't > remove the mount point directory after expire. > > Orion, how about posting the entire log so I can see what autofs > is doing. I wonder if this could be related to another bug I've been mystified by. I have the situation where mounts suddenly stopped expiring. Investigation shows that the autofs dentry has an extra reference. That's odd for a couple of reasons, first this started happening after I did a yum update against Fedora 16 and I had been using my machine for testing various kernel changes to vfs-automount for months without seeing this problem. Second, when I go into runlevel 3 the problem goes away. Further, this appears to be independent of file system, it happened first when I used an NFS mount but I see the same behaviour when I use a local fs bind mount. AFAICT this isn't an open file or pwd since the vfsmount reference count is not also elevated but I can't find any kernel service (perhaps some sort of notify) that elevates only a dentry reference count. Any thoughts, anyone? (In reply to comment #9) > > I wonder if this could be related to another bug I've been > mystified by. > > I have the situation where mounts suddenly stopped expiring. btw, I have also checked the directory that holds the reference is really empty so there has been no dentry created within the mount point. That might happen if there was a race where walks were not blocked during mount, although not likley. Created attachment 576493 [details]
Apr 3 messages
Here are the full logs from April 3. We do use the browse option.
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |