336961 – Reloading autofs map incorrectly removes all map entries

Bug 336961 - Reloading autofs map incorrectly removes all map entries

Summary: Reloading autofs map incorrectly removes all map entries

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	autofs
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ian Kent
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:	GSSApproved ResovleBy=01/30/08
Duplicates (1):	427117 (view as bug list)
Depends On:
Blocks:	425889 429163 432351
TreeView+	depends on / blocked

Reported:	2007-10-18 02:09 UTC by Ian Kent
Modified:	2018-10-19 22:30 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2008-0354
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 14:38:18 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to mark map instances stale so they aren't "cleaned" during updates (626 bytes, patch) 2007-10-18 02:09 UTC, Ian Kent	no flags	Details \| Diff
Patch to handle case of included maps (655 bytes, patch) 2008-01-04 06:02 UTC, Ian Kent	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0354	0	normal	SHIPPED_LIVE	autofs bug fix and enhancement update	2008-05-20 12:52:25 UTC

Description Ian Kent 2007-10-18 02:09:14 UTC

Description of problem:
From the upstream autofs mailing list:

Tittle: Re: [autofs] Seeing some 5.0.1 stop expiring mounts
On Wed, Jul 18, 2007 at 02:46:07PM +0800, Ian Kent wrote:
> Might be worth considering going to 5.0.2, especially since you have a
> busy site, as a nasty deadlock in the alarm handler has been fixed. 

Unfortunately, we can't.  I've since found out (through trial and error)
that the patch:
> > autofs-5.0.1-map-update-source-only.patch

is completely broken for us, and it appears to be part of the 5.0.2
codebase now.

Our main setup for auotofs5 clients now is pure ldap.. auto_master is in
ldap, nsswitch.conf has
automount: ldap
everything is in ldap.  We remove all /etc/auto.* files too.

With the above patch applied to 5.0.1 (or using 5.0.2) as soon as the
daemon gets a HUP signal, it flushes out all the auto.projects (our main
map) entries from /proc/mounts and they're gone forever.

When first started, and until the daemon gets a HUP, it works fine.  Our
/proc/mounts has 6200+ entries (we have a crapload of paths) and
they'll mount great.  Entries look as expected, e.g.:
auto.projects /prj/qct/gv autofs
rw,fd=6,pgrp=2571,timeout=600,minproto=5,maxproto=5,direct 0 0
then if you mount it, it adds in:
ronald:/vol/eng_ice_0014/qct_gv /prj/qct/gv nfs
rw,v3,rsize=32768,wsize=32768,acregmin=1,acregmax=5,acdirmin=1,acdirmax=5,hard,lock,proto=tcp,addr=ronald
0 0

After the HUP, the thing flushes, then logs a ton of rm_dir errors..
like so:

Start of daemon:
automount[2571]: mounted direct mount on /prj/qct/gv with timeout 600, freq 150
seconds

Flush after HUP:
automount[2572]: umounted direct mount /prj/qct/gv

After all umounts.. these errors show for every path:
automount[2549]: rmdir_path: lstat of /prj/qct/gv failed.

I did a test with 5.0.1 with all patchs sans the
autofs-5.0.1-map-update-source-only.patch and it's fine... I can HUP
left and right, do kill -USR1 to flush, etc.  Works right.  But rebuild again
with that patch and first HUP breaks all our auto.projects paths.  Weird
thing is the /net and /usr2 (indirect home dirs) stay working.  Those
entries look like:
$ egrep 'auto.home|/net' /proc/mounts | grep -v auto.projects
-hosts /net autofs rw,fd=9,pgrp=13948,timeout=600,minproto=5,maxproto=5,indirect 0 0
auto.home /usr2 autofs
rw,fd=14,pgrp=13948,timeout=600,minproto=5,maxproto=5,indirect 0 0

Version-Release number of selected component (if applicable):
autofs-5.0.1-0.rc2.54

How reproducible:
Always

Steps to Reproduce:
TBA.

Comment 1 Ian Kent 2007-10-18 02:09:14 UTC

Created attachment 230571 [details]
Patch to mark map instances stale so they aren't "cleaned" during updates

Comment 2 RHEL Program Management 2007-10-18 02:14:48 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 RHEL Program Management 2007-10-18 02:15:48 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 6 Ian Kent 2008-01-04 05:45:39 UTC

*** Bug 427117 has been marked as a duplicate of this bug. ***

Comment 7 Ian Kent 2008-01-04 06:02:19 UTC

Created attachment 290817 [details]
Patch to handle case of included maps

Comment 8 Ian Kent 2008-01-04 06:10:51 UTC

Deke,

I've uploaded the i386 and x86_64 builds for autofs
revision 0.rc2.80, which includes the patch above, to
http://www.kernel.org/pub/linux/kernel/people/raven/autofs.
Please test this build.

As always we must keep in mind that this is the current CVS
development and hasn't completed the QA process so there may
be unexpected problems. But then that's what testing is about.

Your help is appreciated.
Ian

Comment 10 Deke Clinger 2008-01-04 23:26:58 UTC

Hello Ian,

First, it's ironic that the original creator of this bug (in the upstream) works
here as well. Hm.

On to the new testing release. I put this on a couple of VMs and tried it out.
The HUP no longer trashes the mount triggers but now the daemon dies:

[root@rico ~]# /etc/init.d/autofs status
automount (pid 1887) is running...
[root@rico ~]# /etc/init.d/autofs reload
Reloading maps
[root@rico ~]# /etc/init.d/autofs status
automount is stopped
[root@rico ~]# pgrep automount
[root@rico ~]# 

The syslog ends with:

Jan  4 15:24:24 rico automount[1892]: st_readmap: state 1 path /-
Jan  4 15:24:24 rico automount[1892]: re-reading map for /-
Jan  4 15:24:24 rico automount[1892]: lookup_nss_read_map: reading map file
/etc/auto.direct
Jan  4 15:24:24 rico automount[1892]: parse_init: parse(sun): init gathered
global options: (null)
Jan  4 15:24:24 rico automount[1892]: lookup_read_map: read included map
+/etc/auto.projects

Thanks for looking at this,

-Deke

Comment 11 Ian Kent 2008-01-05 03:30:08 UTC

(In reply to comment #10)
> Hello Ian,
> 
> First, it's ironic that the original creator of this bug (in the upstream) works
> here as well. Hm.
> 
> On to the new testing release. I put this on a couple of VMs and tried it out.
> The HUP no longer trashes the mount triggers but now the daemon dies:
> 
> [root@rico ~]# /etc/init.d/autofs status
> automount (pid 1887) is running...
> [root@rico ~]# /etc/init.d/autofs reload
> Reloading maps
> [root@rico ~]# /etc/init.d/autofs status
> automount is stopped
> [root@rico ~]# pgrep automount
> [root@rico ~]# 
> 
> The syslog ends with:
> 
> Jan  4 15:24:24 rico automount[1892]: st_readmap: state 1 path /-
> Jan  4 15:24:24 rico automount[1892]: re-reading map for /-
> Jan  4 15:24:24 rico automount[1892]: lookup_nss_read_map: reading map file
> /etc/auto.direct
> Jan  4 15:24:24 rico automount[1892]: parse_init: parse(sun): init gathered
> global options: (null)
> Jan  4 15:24:24 rico automount[1892]: lookup_read_map: read included map
> +/etc/auto.projects
> 

Do you have any SEGV messages in /var/log/messages?
How about a core file in the root directory?
If not selinux may be preventing it from being written.
Please try again with selinux in permissive mode.

If you have a core file then, ensure you have the
autofs debuginfo package installed, and post the
output from:
gdb -c <core file> /usr/sbin/automount
(gdb) info threads
(gdb) thr a a bt

Sorry this has become such a pain.
Ian

Comment 12 Ian Kent 2008-01-05 04:28:55 UTC

(In reply to comment #11)
> > Jan  4 15:24:24 rico automount[1892]: st_readmap: state 1 path /-
> > Jan  4 15:24:24 rico automount[1892]: re-reading map for /-
> > Jan  4 15:24:24 rico automount[1892]: lookup_nss_read_map: reading map file
> > /etc/auto.direct
> > Jan  4 15:24:24 rico automount[1892]: parse_init: parse(sun): init gathered
> > global options: (null)
> > Jan  4 15:24:24 rico automount[1892]: lookup_read_map: read included map
> > +/etc/auto.projects
> > 
> 
> Do you have any SEGV messages in /var/log/messages?
> How about a core file in the root directory?
> If not selinux may be preventing it from being written.
> Please try again with selinux in permissive mode.
> 
> If you have a core file then, ensure you have the
> autofs debuginfo package installed, and post the
> output from:
> gdb -c <core file> /usr/sbin/automount
> (gdb) info threads
> (gdb) thr a a bt
> 
> Sorry this has become such a pain.

Scratch that.
There's a mistake in the patch.
I'm totally mystified how I was able to test this and see it
work. But then the date on the log entry was completely
wrong as well, I must have been asleep or something.

I'll build a new revision and post it in the normal place.
Please, once again, give it a try.
Ian

Comment 13 Deke Clinger 2008-01-07 19:51:56 UTC

Ian,

The last build (rc2.81) seems to be working well: HUP rereads the maps without
stopping the daemon or trashing the idle automount triggers. It still won't do a
full restart but we can live without that. 

Please submit this for QA, etc. and get it into RHN so we can use it in production. 

Thanks again for working on this with me over the holiday, etc.

-Deke

Comment 14 Ian Kent 2008-01-08 01:52:41 UTC

(In reply to comment #13)
> Ian,
> 
> The last build (rc2.81) seems to be working well: HUP rereads the maps without
> stopping the daemon or trashing the idle automount triggers. It still won't do a
> full restart but we can live without that. 

That will be a different bug but, just briefly, what are
you seeing?

> 
> Please submit this for QA, etc. and get it into RHN so we can use it in
production. 

Yes, regardless of what other problems exist this needs to be
added.

> 
> Thanks again for working on this with me over the holiday, etc.

My pleasure.

Ian

Comment 15 Deke Clinger 2008-01-08 02:44:09 UTC

autofs restart is about as before. Command output looks like this:

[root@rico ~]# /etc/init.d/autofs restart
Stopping automount:                                        [FAILED]
Starting automount: automount: program is already running.
                                                           [FAILED]

and if debug logging is on the syslog gets a lot of stuff like:

Jan  7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/data9
Jan  7 18:36:23 rico automount[1882]: umount_multi: path /prj/vocoder/appdsp2 incl 0
Jan  7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/appdsp2
Jan  7 18:36:23 rico automount[1882]: umount_multi: path
/prj/vlsi/vlsi_verify/conan incl 0
Jan  7 18:36:23 rico automount[1882]: umounted direct mount
/prj/vlsi/vlsi_verify/conan
Jan  7 18:36:23 rico automount[1882]: umount_multi: path /prj/vlsi/q1601 incl 0
Jan  7 18:36:24 rico automount[1882]: umounted direct mount /prj/vlsi/q1601

I could perhaps get this to work by tuning the stop function in the init script
(put in a delay loop, etc) but IIRC you were working on a more correct solution
to this - I'd rather wait for that.

Thanks again.

-Deke

Comment 16 Ian Kent 2008-01-08 04:15:34 UTC

(In reply to comment #15)
> autofs restart is about as before. Command output looks like this:
> 
> [root@rico ~]# /etc/init.d/autofs restart
> Stopping automount:                                        [FAILED]
> Starting automount: automount: program is already running.
>                                                            [FAILED]

Ahh, yes, I remember now.

> 
> and if debug logging is on the syslog gets a lot of stuff like:
> 
> Jan  7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/data9
> Jan  7 18:36:23 rico automount[1882]: umount_multi: path /prj/vocoder/appdsp2
incl 0
> Jan  7 18:36:23 rico automount[1882]: umounted direct mount /prj/vocoder/appdsp2
> Jan  7 18:36:23 rico automount[1882]: umount_multi: path
> /prj/vlsi/vlsi_verify/conan incl 0
> Jan  7 18:36:23 rico automount[1882]: umounted direct mount
> /prj/vlsi/vlsi_verify/conan
> Jan  7 18:36:23 rico automount[1882]: umount_multi: path /prj/vlsi/q1601 incl 0
> Jan  7 18:36:24 rico automount[1882]: umounted direct mount /prj/vlsi/q1601
> 
> I could perhaps get this to work by tuning the stop function in the init script
> (put in a delay loop, etc) but IIRC you were working on a more correct solution
> to this - I'd rather wait for that.

Yep, and debug logging will make it even slower to shutdown with a
large number of entries in direct maps or a large number of active
mounts.

As I said, I do have a plan to fix this but I haven't started on it
just yet. Fact is that our autofs regression tests indicate that
shutdowns with a large number of mounts take much too long so I'll
be looking at that first.

Ian

Comment 25 errata-xmlrpc 2008-05-21 14:38:18 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0354.html

Note You need to log in before you can comment on or make changes to this bug.