Description of problem: If a SIGHUP is sent to automount immediately after processing of failed (negative) lookups (i.e. attempts to access directories that are not keys in the relevant automount map) the daemon dies with a segmentation fault: # gdb /usr/sbin/automount /core.29142 GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.1rh) [...] (gdb) bt #0 ?? () at ../sysdeps/x86_64/strcpy.S:60 from /lib64/tls/libc.so.6 #1 0x000000552aab338c in cache_ghost (root=Variable "root" is not available.) at cache.c:422 #2 0x0000002a9687dcfb in lookup_ghost (root=0x552abbb100 "/auto", ghost=0, now=0, context=Variable "context" is not available.) at lookup_file.c:756 #3 0x000000552aab01c2 in handle_mounts (path=Variable "path" is not available.} at automount.c:1049 #4 0x000000552aab1139 in main (argc=Variable "argc" is not available.) at automount.c:2154 (gdb) print gc $1 = {root = 0x552abbb100 "/auto", mapname = 0x7fbfffd460 "file:/etc/auto.indirect", direct_base = '\0' <repeats 255 times>, key = "nonesuch432-29160", '\0' <repeats 238 times>, mapent = '\0' <repeats 4095 times>} (gdb) print *me $2 = {next = 0x552abcaac0, key = 0x552abcaf80 "nonesuch432-29160", mapent = 0x0, age = 1213134488, negative = 1, last_lookup = 1213134488} Version-Release number of selected component (if applicable): Reported on autofs-4.1.3-231-x86_64. Also confirmed in autofs-4.1.3-232. How reproducible: Readily, although not 100% in my tests. Steps to Reproduce: 1. Configure at least one autofs managed directory 2. # /etc/init.d/autofs start Starting automount: [ OK ] 3. Access a non-existent directory under the autofs mount point and immediately HUP the daemon: # ls /auto/qwertyuiop; kill -HUP $(pidof automount) ls: /auto/qwertyuiop: No such file or directory Actual results: The daemon dies: # tail /var/log/messages Jun 11 06:03:31 dhcp243-116 automount[20561]: attempting to mount entry /auto/asgaoigdsg Jun 11 06:03:31 dhcp243-116 automount[20571]: lookup(file): key "asgaoigdsg" not found in map. Jun 11 06:03:31 dhcp243-116 automount[20571]: failed to mount /auto/asgaoigdsg Jun 11 06:03:48 dhcp243-116 automount[20561]: automount: re-reading configuration. Jun 11 06:04:00 dhcp243-116 automount[20561]: attempting to mount entry /auto/asgaoigdsg Jun 11 06:04:00 dhcp243-116 automount[20581]: lookup(file): key "asgaoigdsg" not found in map. Jun 11 06:04:00 dhcp243-116 automount[20581]: failed to mount /auto/asgaoigdsg Jun 11 06:04:00 dhcp243-116 automount[20561]: automount: re-reading configuration. Jun 11 06:04:00 dhcp243-116 automount[20561]: process 20561 got unexpected signal 11! Jun 11 06:04:00 dhcp243-116 kernel: automount[20561]: segfault at 0000000000000000 rip 0000002a959164b0 rsp 0000007fbfffd078 error 4 Expected results: Maps reloaded as expected. No segfault. Additional info:
A negative entry of course has to have a NULL mapent. If there is no entry in the map, what would we put in the mapent? So, the real question is why the SIGHUP doesn't actually clean out the cache before re-populating it. Simply adding a cache_release to st_readmap() may be all that is required for this particular problem. Now, there still is some cleaning up to do w.r.t. the assumption that me->mapent is always non-null, so I'll fix that up, too. Is the reproducer posted to the IT? Can you attach it to the bugzilla?
Created attachment 308945 [details] clean the cache when re-reading the map I tested this fix using the autochoker reproducer, and it now passes.
Waiting for testing feedback from the customer.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
Created attachment 309516 [details] Handle negative map entries in the cache code. There is one potential downside to the previous patch. If the map sources (e.g. nis or ldap) are not available when the HUP signal is sent to the daemon, then the current map contents are expunged and the automounter will not be able to service any lookup requests from its cache. As such, it may be more robust to fix the cache routines to handle the negative entries and let the cache_clean (done after the read_map but before the cache_ghost inside the lookup_ghost routines) take care of purging stale data, as it always has.
autofs-4.1.3-233 contains a fix for this problem
(In reply to comment #35) > Jeff, please see below from Cisco... I don't have confirmatin yet, but > would these be received using --debug on the mount or is this a bug? These messages will be printed when debug logging in disabled. So, this is not expected behaviour. > Customer reports an issue with the hotfix : > > --- > > I'm running this on [ host ] stank, and autochoker seems to be inducing > syslog > messages out the wazoo: > > Jun 18 18:10:03 stank automount[3051]: cache_ghost: unexpected negative > entry found for nonesuch45-5931 > Jun 18 18:10:33 stank last message repeated 3441079 times > Jun 18 18:11:34 stank last message repeated 6626178 times > Jun 18 18:12:35 stank last message repeated 6894235 times > Jun 18 18:13:15 stank last message repeated 4407509 times > Jun 18 18:13:15 stank automount[3051]: cache_ghost: unexpected negative > entry found for nonesuch45-5931 > Jun 18 18:13:45 stank last message repeated 3429753 times > Jun 18 18:14:46 stank last message repeated 6749022 times > Jun 18 18:15:47 stank last message repeated 6700989 times > Jun 18 18:16:48 stank last message repeated 6755448 times > > Millions of syslogs in a minute doesn't seem reasonable... Agreed! > It looks like it's all for a single entry, which suggests another race > condition that's not cleaning out things properly. I'm not sure why you think this is a race condition. Anyway, I'll take a look. Thanks!
OK, that patch definitely introduced a bug. I never advanced the mapent pointer when "skipping" over the negative entry, and thus we continue to iterate over the same map entry forever. That's a big problem. As for the unexpected entry, it turns out that the lookup followed immediately (within the same second) by a SIGHUP would cause the age of the negative entry to be new enough to not get purged from the cache (hence your race theory). This is harmless, so I'll change that message to debug level. I'll post an updated patch shortly. Thanks for being so diligent.
Verified RHTS job #24346
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0702.html