450844 – automount segfault on SIGHUP after negative lookups

Bug 450844 - automount segfault on SIGHUP after negative lookups

Summary: automount segfault on SIGHUP after negative lookups

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	autofs
Sub Component:
Version:	4.6
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Moyer
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	451682
TreeView+	depends on / blocked

Reported:	2008-06-11 10:27 UTC by Bryn M. Reeves
Modified:	2018-10-20 01:36 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2008-0702
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-07-24 19:51:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
clean the cache when re-reading the map (1.37 KB, patch) 2008-06-11 15:38 UTC, Jeff Moyer	no flags	Details \| Diff
Handle negative map entries in the cache code. (1.24 KB, patch) 2008-06-16 17:06 UTC, Jeff Moyer	no flags	Details \| Diff
Fix the previous patch to not loop infinitely upon finding a negative mapent. (1.26 KB, patch) 2008-06-19 16:04 UTC, Jeff Moyer	no flags	Details \| Diff
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0702	0	normal	SHIPPED_LIVE	autofs bug fix and enhancement update	2008-07-23 15:43:37 UTC

Description Bryn M. Reeves 2008-06-11 10:27:13 UTC

Description of problem:
If a SIGHUP is sent to automount immediately after processing of failed
(negative) lookups (i.e. attempts to access directories that are not keys in the
relevant automount map) the daemon dies with a segmentation fault:

      # gdb /usr/sbin/automount /core.29142
      GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.1rh)
      [...]
      (gdb) bt
      #0  ?? () at ../sysdeps/x86_64/strcpy.S:60 from /lib64/tls/libc.so.6
      #1  0x000000552aab338c in cache_ghost (root=Variable "root" is not
available.) at cache.c:422
      #2  0x0000002a9687dcfb in lookup_ghost (root=0x552abbb100 "/auto",
ghost=0, now=0, context=Variable "context" is not available.) at lookup_file.c:756
      #3  0x000000552aab01c2 in handle_mounts (path=Variable "path" is not
available.} at automount.c:1049
      #4  0x000000552aab1139 in main (argc=Variable "argc" is not available.) at
automount.c:2154
     
      (gdb) print gc
      $1 = {root = 0x552abbb100 "/auto",
        mapname = 0x7fbfffd460 "file:/etc/auto.indirect",
        direct_base = '\0' <repeats 255 times>,
        key = "nonesuch432-29160", '\0' <repeats 238 times>,
        mapent = '\0' <repeats 4095 times>}
     
      (gdb) print *me
      $2 = {next = 0x552abcaac0, key = 0x552abcaf80 "nonesuch432-29160", mapent
= 0x0, age = 1213134488, negative = 1, last_lookup = 1213134488}

Version-Release number of selected component (if applicable):
Reported on autofs-4.1.3-231-x86_64. Also confirmed in autofs-4.1.3-232.

How reproducible:
Readily, although not 100% in my tests.

Steps to Reproduce:
1. Configure at least one autofs managed directory
2. # /etc/init.d/autofs start
   Starting automount:                                        [  OK  ]

3. Access a non-existent directory under the autofs mount point and immediately 
   HUP the daemon:
   # ls /auto/qwertyuiop; kill -HUP $(pidof automount)
   ls: /auto/qwertyuiop: No such file or directory

Actual results:
The daemon dies:
   # tail /var/log/messages
   Jun 11 06:03:31 dhcp243-116 automount[20561]: attempting to mount entry
/auto/asgaoigdsg
   Jun 11 06:03:31 dhcp243-116 automount[20571]: lookup(file): key "asgaoigdsg"
not found in map.
   Jun 11 06:03:31 dhcp243-116 automount[20571]: failed to mount /auto/asgaoigdsg
   Jun 11 06:03:48 dhcp243-116 automount[20561]: automount: re-reading
configuration. 
   Jun 11 06:04:00 dhcp243-116 automount[20561]: attempting to mount entry
/auto/asgaoigdsg
   Jun 11 06:04:00 dhcp243-116 automount[20581]: lookup(file): key "asgaoigdsg"
not found in map.
   Jun 11 06:04:00 dhcp243-116 automount[20581]: failed to mount /auto/asgaoigdsg
   Jun 11 06:04:00 dhcp243-116 automount[20561]: automount: re-reading
configuration. 
   Jun 11 06:04:00 dhcp243-116 automount[20561]: process 20561 got unexpected
signal 11!
   Jun 11 06:04:00 dhcp243-116 kernel: automount[20561]: segfault at
0000000000000000 rip 0000002a959164b0 rsp 0000007fbfffd078 error 4


Expected results:
Maps reloaded as expected. No segfault.

Additional info:

Comment 3 Jeff Moyer 2008-06-11 15:08:07 UTC

A negative entry of course has to have a NULL mapent.  If there is no entry in
the map, what would we put in the mapent?

So, the real question is why the SIGHUP doesn't actually clean out the cache
before re-populating it.  Simply adding a cache_release to st_readmap() may be
all that is required for this particular problem.  Now, there still is some
cleaning up to do w.r.t. the assumption that me->mapent is always non-null, so
I'll fix that up, too.

Is the reproducer posted to the IT?  Can you attach it to the bugzilla?

Comment 4 Jeff Moyer 2008-06-11 15:38:50 UTC

Created attachment 308945 [details]
clean the cache when re-reading the map

I tested this fix using the autochoker reproducer, and it now passes.

Comment 11 Jeff Moyer 2008-06-13 13:14:05 UTC

Waiting for testing feedback from the customer.

Comment 15 RHEL Program Management 2008-06-13 18:32:49 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 22 Jeff Moyer 2008-06-16 17:06:15 UTC

Created attachment 309516 [details]
Handle negative map entries in the cache code.

There is one potential downside to the previous patch.	If the map sources
(e.g. nis or ldap) are not available when the HUP signal is sent to the daemon,
then the current map contents are expunged and the automounter will not be able
to service any lookup requests from its cache.	As such, it may be more robust
to fix the cache routines to handle the negative entries and let the
cache_clean (done after the read_map but before the cache_ghost inside the
lookup_ghost routines) take care of purging stale data, as it always has.

Comment 33 Jeff Moyer 2008-06-17 16:31:10 UTC

autofs-4.1.3-233 contains a fix for this problem

Comment 37 Jeff Moyer 2008-06-19 14:42:02 UTC

(In reply to comment #35)
> Jeff, please see below from Cisco... I don't have confirmatin yet, but
> would these be received using --debug on the mount or is this a bug?

These messages will be printed when debug logging in disabled.  So, this is not
expected behaviour.

> Customer reports an issue with the hotfix :
> 
> ---
> 
> I'm running this on [ host ] stank, and autochoker seems to be inducing
> syslog
> messages out the wazoo:
> 
>  Jun 18 18:10:03 stank automount[3051]: cache_ghost: unexpected negative
> entry found for nonesuch45-5931
>  Jun 18 18:10:33 stank last message repeated 3441079 times
>  Jun 18 18:11:34 stank last message repeated 6626178 times
>  Jun 18 18:12:35 stank last message repeated 6894235 times
>  Jun 18 18:13:15 stank last message repeated 4407509 times
>  Jun 18 18:13:15 stank automount[3051]: cache_ghost: unexpected negative
> entry found for nonesuch45-5931
>  Jun 18 18:13:45 stank last message repeated 3429753 times
>  Jun 18 18:14:46 stank last message repeated 6749022 times
>  Jun 18 18:15:47 stank last message repeated 6700989 times
>  Jun 18 18:16:48 stank last message repeated 6755448 times
>  
> Millions of syslogs in a minute doesn't seem reasonable...

Agreed!

> It looks like it's all for a single entry, which suggests another race
> condition that's not cleaning out things properly.

I'm not sure why you think this is a race condition.  Anyway, I'll take a look.

Thanks!

Comment 38 Jeff Moyer 2008-06-19 15:54:01 UTC

OK, that patch definitely introduced a bug.  I never advanced the mapent pointer
when "skipping" over the negative entry, and thus we continue to iterate over
the same map entry forever.  That's a big problem.

As for the unexpected entry, it turns out that the lookup followed immediately
(within the same second) by a SIGHUP would cause the age of the negative entry
to be new enough to not get purged from the cache (hence your race theory). 
This is harmless, so I'll change that message to debug level.

I'll post an updated patch shortly.  Thanks for being so diligent.

Comment 42 Barry Donahue 2008-06-25 19:16:00 UTC

Verified RHTS job #24346

Comment 45 errata-xmlrpc 2008-07-24 19:51:51 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0702.html

Note You need to log in before you can comment on or make changes to this bug.