Description of problem: From the upstream autofs mailing list: Tittle: Re: [autofs] Seeing some 5.0.1 stop expiring mounts On Wed, Jul 18, 2007 at 02:46:07PM +0800, Ian Kent wrote: > Might be worth considering going to 5.0.2, especially since you have a > busy site, as a nasty deadlock in the alarm handler has been fixed. Unfortunately, we can't. I've since found out (through trial and error) that the patch: > > autofs-5.0.1-map-update-source-only.patch is completely broken for us, and it appears to be part of the 5.0.2 codebase now. Our main setup for auotofs5 clients now is pure ldap.. auto_master is in ldap, nsswitch.conf has automount: ldap everything is in ldap. We remove all /etc/auto.* files too. With the above patch applied to 5.0.1 (or using 5.0.2) as soon as the daemon gets a HUP signal, it flushes out all the auto.projects (our main map) entries from /proc/mounts and they're gone forever. When first started, and until the daemon gets a HUP, it works fine. Our /proc/mounts has 6200+ entries (we have a crapload of paths) and they'll mount great. Entries look as expected, e.g.: auto.projects /prj/qct/gv autofs rw,fd=6,pgrp=2571,timeout=600,minproto=5,maxproto=5,direct 0 0 then if you mount it, it adds in: ronald:/vol/eng_ice_0014/qct_gv /prj/qct/gv nfs rw,v3,rsize=32768,wsize=32768,acregmin=1,acregmax=5,acdirmin=1,acdirmax=5,hard,lock,proto=tcp,addr=ronald 0 0 After the HUP, the thing flushes, then logs a ton of rm_dir errors.. like so: Start of daemon: automount[2571]: mounted direct mount on /prj/qct/gv with timeout 600, freq 150 seconds Flush after HUP: automount[2572]: umounted direct mount /prj/qct/gv After all umounts.. these errors show for every path: automount[2549]: rmdir_path: lstat of /prj/qct/gv failed. I did a test with 5.0.1 with all patchs sans the autofs-5.0.1-map-update-source-only.patch and it's fine... I can HUP left and right, do kill -USR1 to flush, etc. Works right. But rebuild again with that patch and first HUP breaks all our auto.projects paths. Weird thing is the /net and /usr2 (indirect home dirs) stay working. Those entries look like: $ egrep 'auto.home|/net' /proc/mounts | grep -v auto.projects -hosts /net autofs rw,fd=9,pgrp=13948,timeout=600,minproto=5,maxproto=5,indirect 0 0 auto.home /usr2 autofs rw,fd=14,pgrp=13948,timeout=600,minproto=5,maxproto=5,indirect 0 0 Version-Release number of selected component (if applicable): autofs-5.0.1-0.rc2.54 How reproducible: Always Steps to Reproduce: TBA.
Created attachment 230571 [details] Patch to mark map instances stale so they aren't "cleaned" during updates
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
*** Bug 427117 has been marked as a duplicate of this bug. ***
Created attachment 290817 [details] Patch to handle case of included maps
Deke, I've uploaded the i386 and x86_64 builds for autofs revision 0.rc2.80, which includes the patch above, to http://www.kernel.org/pub/linux/kernel/people/raven/autofs. Please test this build. As always we must keep in mind that this is the current CVS development and hasn't completed the QA process so there may be unexpected problems. But then that's what testing is about. Your help is appreciated. Ian
Hello Ian, First, it's ironic that the original creator of this bug (in the upstream) works here as well. Hm. On to the new testing release. I put this on a couple of VMs and tried it out. The HUP no longer trashes the mount triggers but now the daemon dies: [root@rico ~]# /etc/init.d/autofs status automount (pid 1887) is running... [root@rico ~]# /etc/init.d/autofs reload Reloading maps [root@rico ~]# /etc/init.d/autofs status automount is stopped [root@rico ~]# pgrep automount [root@rico ~]# The syslog ends with: Jan 4 15:24:24 rico automount[1892]: st_readmap: state 1 path /- Jan 4 15:24:24 rico automount[1892]: re-reading map for /- Jan 4 15:24:24 rico automount[1892]: lookup_nss_read_map: reading map file /etc/auto.direct Jan 4 15:24:24 rico automount[1892]: parse_init: parse(sun): init gathered global options: (null) Jan 4 15:24:24 rico automount[1892]: lookup_read_map: read included map +/etc/auto.projects Thanks for looking at this, -Deke
(In reply to comment #10) > Hello Ian, > > First, it's ironic that the original creator of this bug (in the upstream) works > here as well. Hm. > > On to the new testing release. I put this on a couple of VMs and tried it out. > The HUP no longer trashes the mount triggers but now the daemon dies: > > [root@rico ~]# /etc/init.d/autofs status > automount (pid 1887) is running... > [root@rico ~]# /etc/init.d/autofs reload > Reloading maps > [root@rico ~]# /etc/init.d/autofs status > automount is stopped > [root@rico ~]# pgrep automount > [root@rico ~]# > > The syslog ends with: > > Jan 4 15:24:24 rico automount[1892]: st_readmap: state 1 path /- > Jan 4 15:24:24 rico automount[1892]: re-reading map for /- > Jan 4 15:24:24 rico automount[1892]: lookup_nss_read_map: reading map file > /etc/auto.direct > Jan 4 15:24:24 rico automount[1892]: parse_init: parse(sun): init gathered > global options: (null) > Jan 4 15:24:24 rico automount[1892]: lookup_read_map: read included map > +/etc/auto.projects > Do you have any SEGV messages in /var/log/messages? How about a core file in the root directory? If not selinux may be preventing it from being written. Please try again with selinux in permissive mode. If you have a core file then, ensure you have the autofs debuginfo package installed, and post the output from: gdb -c <core file> /usr/sbin/automount (gdb) info threads (gdb) thr a a bt Sorry this has become such a pain. Ian
(In reply to comment #11) > > Jan 4 15:24:24 rico automount[1892]: st_readmap: state 1 path /- > > Jan 4 15:24:24 rico automount[1892]: re-reading map for /- > > Jan 4 15:24:24 rico automount[1892]: lookup_nss_read_map: reading map file > > /etc/auto.direct > > Jan 4 15:24:24 rico automount[1892]: parse_init: parse(sun): init gathered > > global options: (null) > > Jan 4 15:24:24 rico automount[1892]: lookup_read_map: read included map > > +/etc/auto.projects > > > > Do you have any SEGV messages in /var/log/messages? > How about a core file in the root directory? > If not selinux may be preventing it from being written. > Please try again with selinux in permissive mode. > > If you have a core file then, ensure you have the > autofs debuginfo package installed, and post the > output from: > gdb -c <core file> /usr/sbin/automount > (gdb) info threads > (gdb) thr a a bt > > Sorry this has become such a pain. Scratch that. There's a mistake in the patch. I'm totally mystified how I was able to test this and see it work. But then the date on the log entry was completely wrong as well, I must have been asleep or something. I'll build a new revision and post it in the normal place. Please, once again, give it a try. Ian
Ian, The last build (rc2.81) seems to be working well: HUP rereads the maps without stopping the daemon or trashing the idle automount triggers. It still won't do a full restart but we can live without that. Please submit this for QA, etc. and get it into RHN so we can use it in production. Thanks again for working on this with me over the holiday, etc. -Deke
(In reply to comment #13) > Ian, > > The last build (rc2.81) seems to be working well: HUP rereads the maps without > stopping the daemon or trashing the idle automount triggers. It still won't do a > full restart but we can live without that. That will be a different bug but, just briefly, what are you seeing? > > Please submit this for QA, etc. and get it into RHN so we can use it in production. Yes, regardless of what other problems exist this needs to be added. > > Thanks again for working on this with me over the holiday, etc. My pleasure. Ian
autofs restart is about as before. Command output looks like this: [root@rico ~]# /etc/init.d/autofs restart Stopping automount: [FAILED] Starting automount: automount: program is already running. [FAILED] and if debug logging is on the syslog gets a lot of stuff like: Jan 7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/data9 Jan 7 18:36:23 rico automount[1882]: umount_multi: path /prj/vocoder/appdsp2 incl 0 Jan 7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/appdsp2 Jan 7 18:36:23 rico automount[1882]: umount_multi: path /prj/vlsi/vlsi_verify/conan incl 0 Jan 7 18:36:23 rico automount[1882]: umounted direct mount /prj/vlsi/vlsi_verify/conan Jan 7 18:36:23 rico automount[1882]: umount_multi: path /prj/vlsi/q1601 incl 0 Jan 7 18:36:24 rico automount[1882]: umounted direct mount /prj/vlsi/q1601 I could perhaps get this to work by tuning the stop function in the init script (put in a delay loop, etc) but IIRC you were working on a more correct solution to this - I'd rather wait for that. Thanks again. -Deke
(In reply to comment #15) > autofs restart is about as before. Command output looks like this: > > [root@rico ~]# /etc/init.d/autofs restart > Stopping automount: [FAILED] > Starting automount: automount: program is already running. > [FAILED] Ahh, yes, I remember now. > > and if debug logging is on the syslog gets a lot of stuff like: > > Jan 7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/data9 > Jan 7 18:36:23 rico automount[1882]: umount_multi: path /prj/vocoder/appdsp2 incl 0 > Jan 7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/appdsp2 > Jan 7 18:36:23 rico automount[1882]: umount_multi: path > /prj/vlsi/vlsi_verify/conan incl 0 > Jan 7 18:36:23 rico automount[1882]: umounted direct mount > /prj/vlsi/vlsi_verify/conan > Jan 7 18:36:23 rico automount[1882]: umount_multi: path /prj/vlsi/q1601 incl 0 > Jan 7 18:36:24 rico automount[1882]: umounted direct mount /prj/vlsi/q1601 > > I could perhaps get this to work by tuning the stop function in the init script > (put in a delay loop, etc) but IIRC you were working on a more correct solution > to this - I'd rather wait for that. Yep, and debug logging will make it even slower to shutdown with a large number of entries in direct maps or a large number of active mounts. As I said, I do have a plan to fix this but I haven't started on it just yet. Fact is that our autofs regression tests indicate that shutdowns with a large number of mounts take much too long so I'll be looking at that first. Ian
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0354.html