We have a case where autofs5 is segfaulting with the following message automount5[21465]: segfault at 0000000000000020 rip 000000552aabbe81 rsp 0000000044a45e70 error 4 The backtrace of the core file is The backtrace in the core file is (gdb) bt #0 close_mount (mod=0x0) at module.c:296 #1 0x0000002a961b188f in parse_done (context=0x552ac75500) at parse_sun.c:1482 #2 0x000000552aabbbf8 in close_parse (mod=0x552acd9c20) at module.c:215 #3 0x0000002a95f6d50a in lookup_done (context=0x552aca6e20) at lookup_file.c:1169 #4 0x000000552aabb948 in close_lookup (mod=0x552ac99440) at module.c:133 #5 0x000000552aabd906 in lookup_close_lookup_instances (map=0x552ac90d20) at lookup.c:922 #6 0x000000552aabd948 in lookup_close_lookup (ap=0x552ac90bd0) at lookup.c:941 #7 0x000000552aac85de in master_mount_mounts (master=0x0, age=1250449409, readall=1) at master.c:1177 #8 0x000000552aac8d05 in master_read_master (master=0x552abec010, age=1250449409, readall=1) at master.c:803 #9 0x000000552aab2f15 in do_read_master (arg=Variable "arg" is not available. ) at automount.c:1277 #10 0x0000002a95672137 in ?? () #11 0x0000000000000000 in ?? () The problem appears to be caused by mount_nfs being set to NULL. int parse_done(void *context) { if (--init_ctr == 0) { rv = close_mount(mount_nfs); mount_nfs = NULL; } .. }
Created attachment 368998 [details] Patch to add lock to protect mount module handle in parse_sun.c
A package which includes the above patch, which attempts to resolve the mount module segfault problem, has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.2 Please test this package. Ian
Created attachment 370021 [details] Patch - dont check null cache on expire
Created attachment 370022 [details] Patch - fix null cache race
Created attachment 370023 [details] Patch - fix cache_init() on source re-read
I've looked fairly closely at the map entry cache handling and have identified a few potential problems. The first patch above, "dont check null cache on expire", is the location of the reported SEGV. General testing is under way but I'm not likely to be able to duplicate the actual problem so we will need to test it within the environment which it occurs.
A package which includes the above patch, which attempts to resolve the mount module null map entry cache segfault problem, has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.3 Please test this package. Ian
Created attachment 374984 [details] Patch - fix memory leak on reload Based on the description of the symptoms regarding frequent reload of maps I've managed to spot another silly mistake. This patch appears to fix the problem. This may not be the only difficulty but I was certainly able to cripple my system fairly quickly without this patch. I'm also building a package including this patch.
A package which includes all the above patches has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.4 From the description in comment #15 I suspect there may still be a locking conflict of some sort between the re-load, the mount requests and the expires so some further testing would be greatly appreciated. Ian
Created attachment 378417 [details] Patch - fix incorrect pthreads condition handling for expire requests Sync with upstream expire thread creation code.
Created attachment 378418 [details] Patch - fix timed wait in handle_packet_expire_direct() Not really a problem here but an error discovered during the work on this.
Created attachment 378420 [details] Patch - fix lock ordering in mount request handling No issues have been reported for this problem but the lock ordering is incorrect and needs to be fixed. Although I'm not entirely happy reducing the scope of these locks I can't see any obvious problem with doing so. We have no choice anyway.
Created attachment 378422 [details] Patch - cleanup pending mount condition handling Prepare for the change to make the thread creation consistent between mount request and expire thread creation.
Created attachment 378423 [details] Patch - expire thread use pending mutex And the final patch to complete the change to make the thread creation consistent.
It may seem like there are a lot of patches but that is intentional. For myself and others that may look at the changes it is much easier to visually verify the correctness of smaller patches with a single defined purpose and an accompanying defining description than to try and wade through a single combined diff, which is what I always try and do.
A package which includes all the above patches has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.5 We cannot be sure that this will correct the errant behaviour but since there is no concrete evidence as to the cause of the problem in the information we have we must try it anyway.
The other thing that we should check is the kernel. What kernel version is in use here? The RHEL-4.8 kernel should be fine but let me check.
Created attachment 428221 [details] Patch - fix cache_init() on source re-read (updated) Fold autofs-5.0.5-fix-memory-leak-on-reload.patch into autofs-5.0.1-fix-cache_init-on-source-re-read.patch where it belongs.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: As it is used quite often, the Network File System (NFS) mount module is pre-opened and cached by the "parse_sun" module, so that it can be accessed by other modules very quickly. However, especially with a high number of simultaneously running threads, it was possible for a race condition to arise, causing the automount5 daemon to terminate unexpectedly with a segmentation fault. This error has been fixed, and automount5 should now work as expected.
build patch confirmed against new src package.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0241.html