Description of problem: Since applying the latest autofs errata autofs mounts have been failing intermittantly. The key passed to the lookup module consists of the server name and directory instead of just the server name. Version-Release number of selected component (if applicable): autofs-4.1.3-231 kernel-smp-2.6.9-67.EL How reproducible: Intermittant Steps to Reproduce: Unknown. Actual results: When attempting to walk/enter an automount managed mountpoint the export isn't mounted and the entry below appears in the logs. Note the key is redhat13/data1 whereas it should be just redhat13. Feb 13 11:54:38 L76639 automount[6184]: handle_packet: type = 0 Feb 13 11:54:38 L76639 automount[6184]: handle_packet_missing: token 4276, name redhat13/data1 Feb 13 11:54:38 L76639 automount[6184]: attempting to mount entry /net/redhat13/data1 Feb 13 11:54:38 L76639 automount[7624]: lookup(program): looking up redhat13/data1 Feb 13 11:54:38 L76639 automount[6184]: mt->key set to redhat13/data1 Feb 13 11:54:38 L76639 automount[7624]: >> /usr/sbin/showmount: can't get address for redhat13/data1 Feb 13 11:54:38 L76639 automount[7624]: lookup(program): lookup for redhat13/data1 failed Feb 13 11:54:38 L76639 automount[7624]: failed to mount /net/redhat13/data1 Feb 13 11:54:38 L76639 automount[7624]: umount_multi: path=/net/redhat13/data1 incl=1 Feb 13 11:54:38 L76639 automount[6184]: handle_child: got pid 7624, sig 0 (0), stat 1 Feb 13 11:54:38 L76639 automount[6184]: sig_child: found pending iop pid 7624: signalled 0 (sig 0), exit status 1 Feb 13 11:54:38 L76639 automount[6184]: send_fail: token=4276 Expected results: The export is mounted. Feb 13 11:53:55 L76639 automount[6184]: handle_packet: type = 0 Feb 13 11:53:55 L76639 automount[6184]: handle_packet_missing: token 4275, name redhat13 Feb 13 11:53:55 L76639 automount[6184]: attempting to mount entry /net/redhat13 Feb 13 11:53:55 L76639 automount[7564]: lookup(program): looking up redhat13 Feb 13 11:53:55 L76639 automount[6184]: mt->key set to redhat13 Feb 13 11:53:55 L76639 automount[7564]: lookup(program): redhat13 -> -fstype=nfs,hard,intr,nodev,nosuid /data1 redhat13:/data1 /data2 redhat 13:/data2 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): expanded entry: -fstype=nfs,hard,intr,nodev,nosuid /data1 redhat13:/data1 /data2 redhat 13:/data2 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("fstype=nfs,hard,intr,nodev,nosuid") -> fstype=nfs,hard,intr,nodev,nosuid Feb 13 11:53:55 L76639 automount[7564]: parse(sun): gathered options: fstype=nfs,hard,intr,nodev,nosuid Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("/data1") -> /data1 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("redhat13:/data1 ") -> redhat13:/data1 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("/data2") -> /data2 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("redhat13:/data2") -> redhat13:/data2 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): multimount: redhat13:/data2 on /data2 with options fstype=nfs,hard,intr,nodev,nosuid Feb 13 11:53:55 L76639 automount[7564]: parse(sun): mounting root /net/redhat13, mountpoint data2, what redhat13:/data2, fstype nfs, options hard,intr,nodev, nosuid Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): root=/net/redhat13 name=data2 what=redhat13:/data2, fstype=nfs, options=hard,intr,nodev,nosuid Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): nfs options="hard,intr,nodev,nosuid", nosymlink=0 Feb 13 11:53:55 L76639 automount[7564]: prune_host_list: added host redhat13, cost 292 to list Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mkdir_path /net/redhat13/data2 Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mount -t nfs -s -o hard,intr,nodev,nosuid redhat13:/data2 /net/redhat13/data2 Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): mounted redhat13:/data2 on /net/redhat13/data2 Feb 13 11:53:55 L76639 automount[7564]: parse(sun): multimount: redhat13:/data1 on /data1 with options fstype=nfs,hard,intr,nodev,nosuid Feb 13 11:53:55 L76639 automount[7564]: parse(sun): mounting root /net/redhat13, mountpoint data1, what redhat13:/data1, fstype nfs, options hard,intr,nodev, nosuid Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): root=/net/redhat13 name=data1 what=redhat13:/data1, fstype=nfs, options=hard,intr,nodev,nosuid Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): nfs options="hard,intr,nodev,nosuid", nosymlink=0 Feb 13 11:53:55 L76639 automount[7564]: prune_host_list: added host redhat13, cost 359 to list Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mkdir_path /net/redhat13/data1 Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mount -t nfs -s -o hard,intr,nodev,nosuid redhat13:/data1 /net/redhat13/data1 Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): mounted redhat13:/data1 on /net/redhat13/data1 Feb 13 11:53:55 L76639 automount[6184]: handle_child: got pid 7564, sig 0 (0), stat 0 Feb 13 11:53:55 L76639 automount[6184]: sig_child: found pending iop pid 7564: signalled 0 (sig 0), exit status 0 Additional info: Attached are logs captured after the problem occurred. The mount points have an timeout/expiry time of 60 seconds. Notice in the logs (at 11:53:55) that redhat13 hadn't expired and therefore hasn't been unmounted before the lookup of redhat13/data1. $ grep looking 10-automount.problem.log Feb 13 11:42:42 L76639 automount[6488]: lookup(program): looking up redhat13 Feb 13 11:43:56 L76639 automount[6627]: lookup(program): looking up redhat13 Feb 13 11:43:56 L76639 automount[6636]: lookup(program): looking up biscuit Feb 13 11:45:53 L76639 automount[6722]: lookup(program): looking up redhat13 Feb 13 11:48:41 L76639 automount[7071]: lookup(program): looking up redhat13 Feb 13 11:51:07 L76639 automount[7175]: lookup(program): looking up redhat13 Feb 13 11:53:55 L76639 automount[7564]: lookup(program): looking up redhat13 Feb 13 11:54:38 L76639 automount[7624]: lookup(program): looking up redhat13/data1 Feb 13 11:54:38 L76639 automount[7629]: lookup(program): looking up redhat13/data1 Feb 13 11:54:45 L76639 automount[7637]: lookup(program): looking up redhat13/data2 Feb 13 11:54:45 L76639 automount[7642]: lookup(program): looking up redhat13 Feb 13 11:54:51 L76639 automount[7657]: lookup(program): looking up biscuit $ grep expired 10-automount.problem.log Feb 13 11:43:46 L76639 automount[6620]: expired /net/redhat13 Feb 13 11:45:01 L76639 automount[6688]: expired /net/biscuit Feb 13 11:45:01 L76639 automount[6691]: expired /net/redhat13 Feb 13 11:47:02 L76639 automount[6774]: expired /net/redhat13 Feb 13 11:49:47 L76639 automount[7121]: expired /net/redhat13 Feb 13 11:52:17 L76639 automount[7506]: expired /net/redhat13
Created attachment 301934 [details] autofs log w/ debug verbosity
Created attachment 301936 [details] /etc/sysconfig/autofs
(In reply to comment #1) > Created an attachment (id=301934) [edit] > autofs log w/ debug verbosity > As you point out, this bit is really wierd, since host redhat13 was mounted less than a minute before and hasn't been umounted. This should not be able to happen! The VFS should not call any of the file system callbacks when something is mounted on the directory. automount[6184]: handle_packet_missing: token 4276, name redhat13/data1 automount[6184]: attempting to mount entry /net/redhat13/data1 automount[7624]: lookup(program): looking up redhat13/data1 automount[6184]: mt->key set to redhat13/data1 automount[7624]: >> /usr/sbin/showmount: can't get address for redhat13/data1 automount[7624]: lookup(program): lookup for redhat13/data1 failed automount[7624]: failed to mount /net/redhat13/data1 automount[7624]: umount_multi: path=/net/redhat13/data1 incl=1 automount[6184]: handle_child: got pid 7624, sig 0 (0), stat 1 automount[6184]: sig_child: found pending iop pid 7624: signalled 0 (sig 0), exit status 1 automount[6184]: send_fail: token=4276
David, I know this is a lot to ask, but if we are unable to reproduce this issue, we'll need some more information to go on. To that end, we'd like to get a crashdump from your system when this happens. If this is your workstation, I'm sure you may not like that! But, if you're willing to give it a try, I'll attach a new auto.net script for you to use which will trigger a crash when this condition is found. Once you get a crash, you can then go back to the old init script.
Created attachment 302014 [details] trigger a crashdump when the key redhat13/data1 is encountered You can apply the patch like so: cd /etc patch -p1 < <path-to-patch-file> You won't need to restart autofs. You will need to make sure that your system is configured to capture crashdumps, though.
Created attachment 302017 [details] updated patch to trigger a crash dump I should not have added the echo line, as it will be processed by automount.
(In reply to comment #5) > Created an attachment (id=302014) [edit] > trigger a crashdump when the key redhat13/data1 is encountered > > You can apply the patch like so: > > cd /etc > patch -p1 < <path-to-patch-file> Ooops! That should be patch -p0. Sorry for the confusion.
After inspecting the log again, this looks to be the same problem as in bug 447420. Unfortunately, that's a private bugzilla, I think. The basic idea is this: a lookup occurs for a multimount entry (that is, an automount entry with multiple offset mounts, all of which are mounted as a single unit). Then, while the mounting is ongoing, another process performs a lookup of a subdirectory that is not yet mounted. Instead of blocking on the pending mount, the process is allowed to walk down into the mount tree and we get a request as you see for a path element under the top. So, we're curently testing a fix for this. When we have a patch prepared we will post it here. Thanks for your patience on this matter!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This bug should be fixed by the patch series committed in bug 460083. Making this bug as MODIFIED as those patches have been committed for RHEL-4.8.
Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later kernel?
(In reply to comment #16) > Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later > kernel? You should have the customer use the RHEL-4.8 kernel.
I guess at this point, it's too late. Though, i'll see what i can do. Thanks for the info.
(In reply to comment #18) > I guess at this point, it's too late. Though, i'll see what i can do. Thanks > for the info. It's worth asking them to test. This looks like the only one we didn't get something out for testing. The RHEL-4.8 kernel update resolved almost all of our reported autofs kernel issues and this doesn't look like it's anything new.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html