Bug 534093 - autofs5: segfault in close_mount()
Summary: autofs5: segfault in close_mount()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: autofs5
Version: 4.8
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Ian Kent
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks: 485811 609088
TreeView+ depends on / blocked
 
Reported: 2009-11-10 15:07 UTC by Sachin Prabhu
Modified: 2018-10-27 12:23 UTC (History)
6 users (show)

Fixed In Version: autofs5-5.0.1-0.rc2.112
Doc Type: Bug Fix
Doc Text:
As it is used quite often, the Network File System (NFS) mount module is pre-opened and cached by the "parse_sun" module, so that it can be accessed by other modules very quickly. However, especially with a high number of simultaneously running threads, it was possible for a race condition to arise, causing the automount5 daemon to terminate unexpectedly with a segmentation fault. This error has been fixed, and automount5 should now work as expected.
Clone Of:
Environment:
Last Closed: 2011-02-16 14:21:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to add lock to protect mount module handle in parse_sun.c (2.30 KB, patch)
2009-11-11 08:18 UTC, Ian Kent
no flags Details | Diff
Patch - dont check null cache on expire (983 bytes, patch)
2009-11-18 06:26 UTC, Ian Kent
no flags Details | Diff
Patch - fix null cache race (4.49 KB, patch)
2009-11-18 06:28 UTC, Ian Kent
no flags Details | Diff
Patch - fix cache_init() on source re-read (2.31 KB, patch)
2009-11-18 06:29 UTC, Ian Kent
no flags Details | Diff
Patch - fix memory leak on reload (1.80 KB, patch)
2009-12-01 08:11 UTC, Ian Kent
no flags Details | Diff
Patch - fix incorrect pthreads condition handling for expire requests (4.77 KB, patch)
2009-12-15 03:19 UTC, Ian Kent
no flags Details | Diff
Patch - fix timed wait in handle_packet_expire_direct() (765 bytes, patch)
2009-12-15 03:22 UTC, Ian Kent
no flags Details | Diff
Patch - fix lock ordering in mount request handling (3.68 KB, patch)
2009-12-15 03:27 UTC, Ian Kent
no flags Details | Diff
Patch - cleanup pending mount condition handling (6.03 KB, patch)
2009-12-15 03:29 UTC, Ian Kent
no flags Details | Diff
Patch - expire thread use pending mutex (6.12 KB, patch)
2009-12-15 03:31 UTC, Ian Kent
no flags Details | Diff
Patch - fix cache_init() on source re-read (updated) (2.78 KB, patch)
2010-07-01 09:45 UTC, Ian Kent
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0241 0 normal SHIPPED_LIVE autofs5 bug fix update 2011-02-15 16:34:58 UTC

Description Sachin Prabhu 2009-11-10 15:07:39 UTC
We have a case where autofs5 is segfaulting with the following message

automount5[21465]: segfault at 0000000000000020 rip 000000552aabbe81 rsp 0000000044a45e70 error 4 

The backtrace of the core file is

The backtrace in the core file is
(gdb) bt
#0  close_mount (mod=0x0) at module.c:296
#1  0x0000002a961b188f in parse_done (context=0x552ac75500) at parse_sun.c:1482
#2  0x000000552aabbbf8 in close_parse (mod=0x552acd9c20) at module.c:215
#3  0x0000002a95f6d50a in lookup_done (context=0x552aca6e20) at lookup_file.c:1169
#4  0x000000552aabb948 in close_lookup (mod=0x552ac99440) at module.c:133
#5  0x000000552aabd906 in lookup_close_lookup_instances (map=0x552ac90d20) at lookup.c:922
#6  0x000000552aabd948 in lookup_close_lookup (ap=0x552ac90bd0) at lookup.c:941
#7  0x000000552aac85de in master_mount_mounts (master=0x0, age=1250449409, readall=1) at master.c:1177
#8  0x000000552aac8d05 in master_read_master (master=0x552abec010, age=1250449409, readall=1) at master.c:803
#9  0x000000552aab2f15 in do_read_master (arg=Variable "arg" is not available.
) at automount.c:1277
#10 0x0000002a95672137 in ?? ()
#11 0x0000000000000000 in ?? ()

The problem appears to be caused by mount_nfs being set to NULL.

int parse_done(void *context)
{
        if (--init_ctr == 0) {
                rv = close_mount(mount_nfs);
                mount_nfs = NULL;
        }
..
}

Comment 4 Ian Kent 2009-11-11 08:18:26 UTC
Created attachment 368998 [details]
Patch to add lock to protect mount module handle in parse_sun.c

Comment 5 Ian Kent 2009-11-11 08:22:25 UTC
A package which includes the above patch, which attempts to
resolve the mount module segfault problem, has been built
and is available at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.2

Please test this package.
Ian

Comment 10 Ian Kent 2009-11-18 06:26:52 UTC
Created attachment 370021 [details]
Patch - dont check null cache on expire

Comment 11 Ian Kent 2009-11-18 06:28:16 UTC
Created attachment 370022 [details]
Patch - fix null cache race

Comment 12 Ian Kent 2009-11-18 06:29:36 UTC
Created attachment 370023 [details]
Patch - fix cache_init() on source re-read

Comment 13 Ian Kent 2009-11-18 06:39:33 UTC
I've looked fairly closely at the map entry cache handling
and have identified a few potential problems. The first patch
above, "dont check null cache on expire", is the location of
the reported SEGV.

General testing is under way but I'm not likely to be able to
duplicate the actual problem so we will need to test it within
the environment which it occurs.

Comment 14 Ian Kent 2009-11-18 08:10:38 UTC
A package which includes the above patch, which attempts to
resolve the mount module null map entry cache segfault problem,
has been built and is available at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.3

Please test this package.
Ian

Comment 18 Ian Kent 2009-12-01 08:11:04 UTC
Created attachment 374984 [details]
Patch - fix memory leak on reload

Based on the description of the symptoms regarding frequent
reload of maps I've managed to spot another silly mistake.

This patch appears to fix the problem.
This may not be the only difficulty but I was certainly able
to cripple my system fairly quickly without this patch.

I'm also building a package including this patch.

Comment 19 Ian Kent 2009-12-01 08:34:00 UTC
A package which includes all the above patches has been built and
is available at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.4

From the description in comment #15 I suspect there may still be
a locking conflict of some sort between the re-load, the mount
requests and the expires so some further testing would be greatly
appreciated.

Ian

Comment 29 Ian Kent 2009-12-15 03:19:40 UTC
Created attachment 378417 [details]
Patch - fix incorrect pthreads condition handling for expire requests

Sync with upstream expire thread creation code.

Comment 30 Ian Kent 2009-12-15 03:22:08 UTC
Created attachment 378418 [details]
Patch - fix timed wait in handle_packet_expire_direct()

Not really a problem here but an error discovered during the
work on this.

Comment 31 Ian Kent 2009-12-15 03:27:22 UTC
Created attachment 378420 [details]
Patch - fix lock ordering in mount request handling

No issues have been reported for this problem but the lock
ordering is incorrect and needs to be fixed. Although I'm not
entirely happy reducing the scope of these locks I can't see
any obvious problem with doing so. We have no choice anyway.

Comment 32 Ian Kent 2009-12-15 03:29:32 UTC
Created attachment 378422 [details]
Patch - cleanup pending mount condition handling

Prepare for the change to make the thread creation consistent
between mount request and expire thread creation.

Comment 33 Ian Kent 2009-12-15 03:31:36 UTC
Created attachment 378423 [details]
Patch - expire thread use pending mutex

And the final patch to complete the change to make the thread
creation consistent.

Comment 34 Ian Kent 2009-12-15 03:36:07 UTC
It may seem like there are a lot of patches but that is
intentional. For myself and others that may look at the
changes it is much easier to visually verify the correctness
of smaller patches with a single defined purpose and an
accompanying defining description than to try and wade
through a single combined diff, which is what I always try
and do.

Comment 35 Ian Kent 2009-12-15 03:46:23 UTC
A package which includes all the above patches has been built and
is available at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.5

We cannot be sure that this will correct the errant behaviour
but since there is no concrete evidence as to the cause of the
problem in the information we have we must try it anyway.

Comment 36 Ian Kent 2009-12-15 03:48:42 UTC
The other thing that we should check is the kernel.
What kernel version is in use here?

The RHEL-4.8 kernel should be fine but let me check.

Comment 75 Ian Kent 2010-07-01 09:45:39 UTC
Created attachment 428221 [details]
Patch - fix cache_init() on source re-read (updated)

Fold autofs-5.0.5-fix-memory-leak-on-reload.patch into
autofs-5.0.1-fix-cache_init-on-source-re-read.patch where
it belongs.

Comment 76 Jaromir Hradilek 2010-07-12 08:39:45 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
As it is used quite often, the Network File System (NFS) mount module is pre-opened and cached by the "parse_sun" module, so that it can be accessed by other modules very quickly. However, especially with a high number of simultaneously running threads, it was possible for a race condition to arise, causing the automount5 daemon to terminate unexpectedly with a segmentation fault. This error has been fixed, and automount5 should now work as expected.

Comment 79 yanfu,wang 2010-12-06 06:29:29 UTC
build patch confirmed against new src package.

Comment 80 errata-xmlrpc 2011-02-16 14:21:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0241.html


Note You need to log in before you can comment on or make changes to this bug.