Bug 534093
Summary: | autofs5: segfault in close_mount() | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Sachin Prabhu <sprabhu> |
Component: | autofs5 | Assignee: | Ian Kent <ikent> |
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-daemons |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.8 | CC: | jmoyer, jnansi, jwest, rbinkhor, tao, yanwang |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | autofs5-5.0.1-0.rc2.112 | Doc Type: | Bug Fix |
Doc Text: |
As it is used quite often, the Network File System (NFS) mount module is pre-opened and cached by the "parse_sun" module, so that it can be accessed by other modules very quickly. However, especially with a high number of simultaneously running threads, it was possible for a race condition to arise, causing the automount5 daemon to terminate unexpectedly with a segmentation fault. This error has been fixed, and automount5 should now work as expected.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-02-16 14:21:32 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 485811, 609088 | ||
Attachments: |
Description
Sachin Prabhu
2009-11-10 15:07:39 UTC
Created attachment 368998 [details]
Patch to add lock to protect mount module handle in parse_sun.c
A package which includes the above patch, which attempts to resolve the mount module segfault problem, has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.2 Please test this package. Ian Created attachment 370021 [details]
Patch - dont check null cache on expire
Created attachment 370022 [details]
Patch - fix null cache race
Created attachment 370023 [details]
Patch - fix cache_init() on source re-read
I've looked fairly closely at the map entry cache handling and have identified a few potential problems. The first patch above, "dont check null cache on expire", is the location of the reported SEGV. General testing is under way but I'm not likely to be able to duplicate the actual problem so we will need to test it within the environment which it occurs. A package which includes the above patch, which attempts to resolve the mount module null map entry cache segfault problem, has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.3 Please test this package. Ian Created attachment 374984 [details]
Patch - fix memory leak on reload
Based on the description of the symptoms regarding frequent
reload of maps I've managed to spot another silly mistake.
This patch appears to fix the problem.
This may not be the only difficulty but I was certainly able
to cripple my system fairly quickly without this patch.
I'm also building a package including this patch.
A package which includes all the above patches has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.4 From the description in comment #15 I suspect there may still be a locking conflict of some sort between the re-load, the mount requests and the expires so some further testing would be greatly appreciated. Ian Created attachment 378417 [details]
Patch - fix incorrect pthreads condition handling for expire requests
Sync with upstream expire thread creation code.
Created attachment 378418 [details]
Patch - fix timed wait in handle_packet_expire_direct()
Not really a problem here but an error discovered during the
work on this.
Created attachment 378420 [details]
Patch - fix lock ordering in mount request handling
No issues have been reported for this problem but the lock
ordering is incorrect and needs to be fixed. Although I'm not
entirely happy reducing the scope of these locks I can't see
any obvious problem with doing so. We have no choice anyway.
Created attachment 378422 [details]
Patch - cleanup pending mount condition handling
Prepare for the change to make the thread creation consistent
between mount request and expire thread creation.
Created attachment 378423 [details]
Patch - expire thread use pending mutex
And the final patch to complete the change to make the thread
creation consistent.
It may seem like there are a lot of patches but that is intentional. For myself and others that may look at the changes it is much easier to visually verify the correctness of smaller patches with a single defined purpose and an accompanying defining description than to try and wade through a single combined diff, which is what I always try and do. A package which includes all the above patches has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz534093.5 We cannot be sure that this will correct the errant behaviour but since there is no concrete evidence as to the cause of the problem in the information we have we must try it anyway. The other thing that we should check is the kernel. What kernel version is in use here? The RHEL-4.8 kernel should be fine but let me check. Created attachment 428221 [details]
Patch - fix cache_init() on source re-read (updated)
Fold autofs-5.0.5-fix-memory-leak-on-reload.patch into
autofs-5.0.1-fix-cache_init-on-source-re-read.patch where
it belongs.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: As it is used quite often, the Network File System (NFS) mount module is pre-opened and cached by the "parse_sun" module, so that it can be accessed by other modules very quickly. However, especially with a high number of simultaneously running threads, it was possible for a race condition to arise, causing the automount5 daemon to terminate unexpectedly with a segmentation fault. This error has been fixed, and automount5 should now work as expected. build patch confirmed against new src package. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0241.html |