Bug 333781
Summary: | autofs can deadlock with multiple simulaneous access to a submount map | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Ian Kent <ikent> | ||||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brock Organ <borgan> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.0 | CC: | ikent, jmoyer | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2008-0354 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-05-21 14:38:11 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 425903 | ||||||||
Attachments: |
|
Description
Ian Kent
2007-10-16 09:42:51 UTC
Created attachment 228481 [details]
Prevent deadlock during multiple simulaneous to program map
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Why does this bug refer to a deadlock in program maps, when the patch addresses submount maps? --- a/modules/mount_autofs.c +++ b/modules/mount_autofs.c This bug report is really devoid of useful information. Ian, can you provide at least *some* insight into the real problem being fixed? Some stack traces or debug logs that show the actual problem would be helpful. I can't reproduce this myself now, on F-8 or RHEL-5 with revision 0.rc2.55. The patch as well as the bug title look suspect. The patch itself is incorrect as it doesn't add the mutex umlock in case the following "if test" fails. I'm tempted to revert this but let me do some more digging first. Ian OK, I think I understand the point of this patch now but it is incomplete. And it relates to submounts not program mounts. I've had some difficulty understanding pthread condition handling and this is an example. Maybe I need to revisit how this is done at some later stage and possibly use an alternate method. The difficulty comes in handling the pthread condition used to pass parameters to handle_mounts() at startup. Since the pthread_cond_wait unlocks the condition mutex while waiting to be signaled, another thread can lock it before the thread being started locks it to send the the completion signal. This can cause the creator thread to be signaled at the wrong time. The symptom we see looks like both signals aren't seen leaving one thread waiting forever. But this shouldn't be possible and it should only effect the execution order so I have to assume the reason the signal isn't seen is due to corruption of the parameters passed caused by this out of order execution. I also believe that this corruption often results in threads mysteriously "going away" rather than a segv (probably why it's has caused me so much confusion). Taking the "mounts" mutex prior to starting the handle_mounts() thread prevents this. The original purpose of the mounts mutex was to protect the list of submounts in the owning autofs_point struct but I can't see any conflict using it to protect the startup condition as well. Anyway, the upshot of all this is that since we can potentially have several submounts happening at once this is where we see the problem. We don't see it at startup because initial mounts are done serially but I think we're open to this race during map re-reads and and we haven't seen it because they aren't done frequently. Consequently, I think the mounts mutex should be taken in both places we start the handle_mounts() thread prior to using the startup condition. Have a look around and see if you concur with my analysis. Ian (In reply to comment #9) > much confusion). Taking the "mounts" mutex prior to starting the > handle_mounts() thread prevents this. The original purpose of the That should be, prior to locking the startup condition mutex. Created attachment 302658 [details]
Create a separate startup conditional for each handle_mounts invocation
If there can be multiple handle_mounts threads starting in parallel, then each
should have its own startup conditional. I think that makes more sense than
trying to serialize via other, unrelated locks. I also fixed some of the lock
release ordering.
I smoke tested this patch, but really it's for discussion.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0354.html |