Bug 804734

Summary:

Home directory mount is getting renamed to "/home/user (deleted)"

Product:

[Fedora] Fedora

Reporter:

Orion Poplawski <orion>

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED WONTFIX

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

bfields, ikent, jlayton, steved

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-13 14:50:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Apr 3 messages	none

Description Orion Poplawski 2012-03-19 16:55:26 UTC

Description of problem:

At some point, the home directory mount for one of our user's got renamed to add " (deleted)":

earth:/export/home/ruth /home/ruth\040(deleted) nfs4 rw,noatime,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.10.20.16,minorversion=0,local_lock=none,addr=10.10.10.1 0 0

Also seeing:
Mar 19 03:15:52 pyramid automount[862]: lookup_one: lookup(ldap): query failed for (&(objectclass=automount)(|(cn=ruth (deleted))(cn=/)(cn=\2A)))
Mar 19 03:15:52 pyramid automount[862]: key "ruth (deleted)" not found in map source(s).

This seems to have started last night, but no error messages:


Mar 18 22:04:22 pyramid puppet-agent[921]: Finished catalog run in 3.45 seconds
Mar 18 22:21:11 pyramid dbus[754]: [system] Activating service name='org.kde.powerdevil.backligh
thelper' (using servicehelper)
Mar 18 22:21:11 pyramid dbus-daemon[754]: dbus[754]: [system] Activating service name='org.kde.p
owerdevil.backlighthelper' (using servicehelper)
Mar 18 22:21:12 pyramid dbus-daemon[754]: no kernel backlight interface found
Mar 18 22:21:12 pyramid org.kde.powerdevil.backlighthelper: QDBusConnection: system D-Bus connec
tion created before QCoreApplication. Application may misbehave.
Mar 18 22:21:12 pyramid dbus[754]: [system] Successfully activated service 'org.kde.powerdevil.b
acklighthelper'
Mar 18 22:21:12 pyramid dbus-daemon[754]: dbus[754]: [system] Successfully activated service 'or
g.kde.powerdevil.backlighthelper'
Mar 18 22:21:12 pyramid dbus-daemon[754]: ** (upowerd:1601): WARNING **: Property get or set doe
s not have an interface string as first arg
Mar 18 22:22:11 pyramid abrtd: Directory 'ccpp-2012-03-18-22:22:11-12328' creation detected
Mar 18 22:22:11 pyramid abrt[12372]: Saved core dump of pid 12328 (/usr/bin/kded4) to /var/spool
/abrt/ccpp-2012-03-18-22:22:11-12328 (37441536 bytes)


It causes kded4 to crash because of stat problems.

I'll turn on autofs debugging and see what that turns up next time.

Version-Release number of selected component (if applicable):
autofs-5.0.6-5.fc16.x86_64

How reproducible:
Happened two times now.

Comment 1 Ian Kent 2012-03-20 02:48:43 UTC

(In reply to comment #0)
> Description of problem:
> 
> At some point, the home directory mount for one of our user's got renamed to
> add " (deleted)":
> 
> earth:/export/home/ruth /home/ruth\040(deleted) nfs4

This looks like an NFS dentry (not hard to work that out).

The VFS changes the dentry name to add " (deleted)" when it
encounters an unlinked but still in use dentry at the end of
a path when calculating the path from the dentry to the root.
It should not be reflected in the underlying file system NFS
or otherwise.

Lets look at the automount debug log to start with.

Comment 2 Orion Poplawski 2012-04-03 17:46:25 UTC

Apr  3 10:00:37 amakihi automount[1333]: handle_packet: type = 3
Apr  3 10:00:37 amakihi automount[1333]: handle_packet_missing_indirect: token 3, name sumsmgr, request pid 1444
Apr  3 10:00:37 amakihi automount[1333]: attempting to mount entry /home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: lookup_name_file_source_instance: file map not found
Apr  3 10:00:37 amakihi automount[1333]: lookup_mount: lookup(ldap): looking up sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: do_bind: lookup(ldap): auth_required: 1, sasl_mech (null)
Apr  3 10:00:37 amakihi automount[1333]: do_bind: lookup(ldap): ldap simple bind returned 0
Apr  3 10:00:37 amakihi automount[1333]: lookup_one: lookup(ldap): searching for "(&(objectclass=automount)(|(cn=sumsmgr)(cn=/)(cn=\2A)))" under "ou=auto.home,dc=cora,dc=nwra,dc=com"
Apr  3 10:00:37 amakihi automount[1333]: lookup_one: lookup(ldap): getting first entry for cn="sumsmgr"
Apr  3 10:00:37 amakihi automount[1333]: lookup_one: lookup(ldap): examining first entry
Apr  3 10:00:37 amakihi automount[1333]: lookup_mount: lookup(ldap): sumsmgr -> earth.cora.nwra.com:/export/home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): expanded entry: earth.cora.nwra.com:/export/home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): gathered options: fstype=nfs4,sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime
Apr  3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): dequote("earth.cora.nwra.com:/export/home/sumsmgr") -> earth.cora.nwra.com:/export/home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: parse_mount: parse(sun): core of entry: options=fstype=nfs4,sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime, loc=earth.cora.nwra.com:/export/home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: sun_mount: parse(sun): mounting root /home, mountpoint 
sumsmgr, what earth.cora.nwra.com:/export/home/sumsmgr, fstype nfs4, options sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime
Apr  3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): root=/home name=sumsmgr what=earth.cora.nwra.com:/export/home/sumsmgr, fstype=nfs4, options=sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime
Apr  3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): nfs options="sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime", nobind=0, nosymlink=0, ro=0
Apr  3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): calling mkdir_path /home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): calling mount -t nfs4 -s -o sec=krb5,intr,rsize=32768,wsize=32768,actimeo=1,noatime earth.cora.nwra.com:/export/home/sumsmgr /home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: spawn_mount: mtab link detected, passing -n to mount
Apr  3 10:00:37 amakihi automount[1333]: mount_mount: mount(nfs): mounted earth.cora.nwra.com:/export/home/sumsmgr on /home/sumsmgr
Apr  3 10:00:37 amakihi automount[1333]: dev_ioctl_send_ready: token = 3
Apr  3 10:00:37 amakihi automount[1333]: mounted /home/sumsmgr

Then a little later:

Apr  3 10:01:19 amakihi automount[1333]: handle_packet: type = 3
Apr  3 10:01:19 amakihi automount[1333]: handle_packet_missing_indirect: token 4, name sumsmgr (deleted), request pid 2017
Apr  3 10:01:19 amakihi automount[1333]: attempting to mount entry /home/sumsmgr (deleted)
Apr  3 10:01:19 amakihi automount[1333]: lookup_name_file_source_instance: file map not found
Apr  3 10:01:19 amakihi automount[1333]: lookup_mount: lookup(ldap): looking up sumsmgr (deleted)
Apr  3 10:01:19 amakihi automount[1333]: do_bind: lookup(ldap): auth_required: 1, sasl_mech (null)
Apr  3 10:01:19 amakihi automount[1333]: do_bind: lookup(ldap): ldap simple bind returned 0
Apr  3 10:01:19 amakihi automount[1333]: lookup_one: lookup(ldap): searching for "(&(objectclass=automount)(|(cn=sumsmgr (deleted))(cn=/)(cn=\2A)))" under "ou=auto.home,dc=cora,dc=nwra,dc=com"
Apr  3 10:01:19 amakihi automount[1333]: lookup_one: lookup(ldap): query failed for (&(objectclass=automount)(|(cn=sumsmgr (deleted))(cn=/)(cn=\2A)))
Apr  3 10:01:19 amakihi automount[1333]: key "sumsmgr (deleted)" not found in map source(s).
Apr  3 10:01:19 amakihi automount[1333]: dev_ioctl_send_fail: token = 4
Apr  3 10:01:19 amakihi automount[1333]: failed to mount /home/sumsmgr (deleted)


I guess this isn't autofs's fault, but I really have no idea what is going one here.  Re-assign to kernel/nfs folks?

Comment 3 Orion Poplawski 2012-04-03 18:30:38 UTC

Here's a possibly related trigger - the system was coming up in standard time and then the clock bumped 1hr by ntpdate after automount has started, but before anything was mounted:

Apr  3 09:08:20 amakihi automount[1310]: st_ready: st_ready(): state = 0 path /data4
Apr  3 09:08:20 amakihi automount[1310]: ghosting enabled
Apr  3 09:08:21 amakihi autofs[1299]: Starting automount: [  OK  ]
Apr  3 09:08:21 amakihi dbus-daemon[925]: [  OK  ]
Apr  3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /nfs
Apr  3 10:08:25 amakihi ntpdate[1322]: step time server 4.28.99.180 offset 3600.319945 sec
Apr  3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc = 140609222506240 path /nfs
Apr  3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /home
Apr  3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc = 140609112016640 path /home

This happened a couple times in a row.  I then fixed the hw clock and have not yet seen the problem.

Comment 4 Ian Kent 2012-04-04 01:18:44 UTC

(In reply to comment #3)
> Here's a possibly related trigger - the system was coming up in standard time
> and then the clock bumped 1hr by ntpdate after automount has started, but
> before anything was mounted:
> 
> Apr  3 09:08:20 amakihi automount[1310]: st_ready: st_ready(): state = 0 path
> /data4
> Apr  3 09:08:20 amakihi automount[1310]: ghosting enabled
> Apr  3 09:08:21 amakihi autofs[1299]: Starting automount: [  OK  ]
> Apr  3 09:08:21 amakihi dbus-daemon[925]: [  OK  ]
> Apr  3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /nfs
> Apr  3 10:08:25 amakihi ntpdate[1322]: step time server 4.28.99.180 offset
> 3600.319945 sec
> Apr  3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc =
> 140609222506240 path /nfs
> Apr  3 10:08:25 amakihi automount[1310]: st_expire: state 1 path /home
> Apr  3 10:08:25 amakihi automount[1310]: expire_proc: exp_proc =
> 140609112016640 path /home
> 
> This happened a couple times in a row.  I then fixed the hw clock and have not
> yet seen the problem.

Right, NFS does a "lot" with times so that does sound like it
might be the source of the problem.

But I thought that when using NTP the hardware clock was
updated from the system time at some point (in the shutdown
I think) so you shouldn't be having such a large discrepancy.

You could verify if this is the cause by updating the machine
hardware clock from the system time time manually (hwclock --systohc).
See if the machine retains the setting, perhaps the CMOS battery
is flat, but if it does retain the time and the change is much
smaller you can see if the problem persists.

It may be the case that NTP is refusing to update the hardware
clock because of the size of the time difference.

Comment 5 Orion Poplawski 2012-04-04 15:40:21 UTC

This is a systemd change.  If /etc/adjtime indicates that the BIOS clock is set to LOCAL time it does not update the BIOS clock.  If it is set to UTC (hwclock --utc --systohc) it does.  This machine was set to LOCAL.

So, do you think there is still a bug to track down in NFS or is this to be expected?  What about when the clock jumps during the change?

Comment 6 Ian Kent 2012-04-04 23:55:29 UTC

(In reply to comment #5)
> This is a systemd change.  If /etc/adjtime indicates that the BIOS clock is set
> to LOCAL time it does not update the BIOS clock.  If it is set to UTC (hwclock
> --utc --systohc) it does.  This machine was set to LOCAL.
> 
> So, do you think there is still a bug to track down in NFS or is this to be
> expected?  What about when the clock jumps during the change?

Not sure but I've always fixed system time problems before
anything else when I've had NFS oddness and always made sure
there were no significant changes being made to the system
time of NFS client systems.

The NFS client uses times a lot for cache coherency so IMHO
the system time must be stable and consistent between all
network hosts that use or provide NFS services.

Perhaps others added to the cc list can give some more
qualified information?

Comment 7 J. Bruce Fields 2012-04-09 16:55:09 UTC

"The NFS client uses times a lot for cache coherency so IMHO
the system time must be stable and consistent between all
network hosts that use or provide NFS services."

When deciding whether its cache is up to date, the client may compare two server-generated timestamps (e.g., "is the last modified time of this file different from when last I looked at it?"), but it should never compare its own time with the server's.

So synchronized clocks should not be necessary for NFS cache consistency.

It shouldn't be possible to unlink a directory that has something mounted on it. Is it possible the automounter decided to expire the directory (if so, would it perform an unlink?), and that expiry raced with the mount process somehow?

Comment 8 Ian Kent 2012-04-10 02:47:24 UTC

(In reply to comment #7)
> 
> So synchronized clocks should not be necessary for NFS cache consistency.
> 
> It shouldn't be possible to unlink a directory that has something mounted on
> it. Is it possible the automounter decided to expire the directory (if so,
> would it perform an unlink?), and that expiry raced with the mount process
> somehow?

Mount to expire often race one another but that has been heavily
worked on over a long time so possibly not an autofs problem.

Also the log fragments show the automount entry uses the "browse"
option (Orion, is that correct for the map here?) so autofs won't
remove the mount point directory after expire.

Orion, how about posting the entire log so I can see what autofs
is doing.

Comment 9 Ian Kent 2012-04-10 03:00:02 UTC

(In reply to comment #8)
> (In reply to comment #7)
> > 
> > So synchronized clocks should not be necessary for NFS cache consistency.
> > 
> > It shouldn't be possible to unlink a directory that has something mounted on
> > it. Is it possible the automounter decided to expire the directory (if so,
> > would it perform an unlink?), and that expiry raced with the mount process
> > somehow?
> 
> Mount to expire often race one another but that has been heavily
> worked on over a long time so possibly not an autofs problem.
> 
> Also the log fragments show the automount entry uses the "browse"
> option (Orion, is that correct for the map here?) so autofs won't
> remove the mount point directory after expire.
> 
> Orion, how about posting the entire log so I can see what autofs
> is doing.

I wonder if this could be related to another bug I've been
mystified by.

I have the situation where mounts suddenly stopped expiring.

Investigation shows that the autofs dentry has an extra reference.
That's odd for a couple of reasons, first this started happening
after I did a yum update against Fedora 16 and I had been using
my machine for testing various kernel changes to vfs-automount
for months without seeing this problem. Second, when I go into
runlevel 3 the problem goes away. Further, this appears to be
independent of file system, it happened first when I used an
NFS mount but I see the same behaviour when I use a local fs
bind mount.

AFAICT this isn't an open file or pwd since the vfsmount
reference count is not also elevated but I can't find any
kernel service (perhaps some sort of notify) that elevates
only a dentry reference count.

Any thoughts, anyone?

Comment 10 Ian Kent 2012-04-10 03:04:27 UTC

(In reply to comment #9)
> 
> I wonder if this could be related to another bug I've been
> mystified by.
> 
> I have the situation where mounts suddenly stopped expiring.

btw, I have also checked the directory that holds the reference
is really empty so there has been no dentry created within the
mount point. That might happen if there was a race where walks
were not blocked during mount, although not likley.

Comment 11 Orion Poplawski 2012-04-10 15:14:44 UTC

Created attachment 576493 [details]
Apr 3 messages

Here are the full logs from April 3.  We do use the browse option.

Comment 12 Fedora End Of Life 2013-01-16 13:54:44 UTC

This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Fedora End Of Life 2013-02-13 14:50:31 UTC

Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.