Bug 333781 - autofs can deadlock with multiple simulaneous access to a submount map
Summary: autofs can deadlock with multiple simulaneous access to a submount map
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs   
(Show other bugs)
Version: 5.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Ian Kent
QA Contact: Brock Organ
Depends On:
Blocks: 425903
TreeView+ depends on / blocked
Reported: 2007-10-16 09:42 UTC by Ian Kent
Modified: 2008-05-21 14:38 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0354
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-05-21 14:38:11 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Prevent deadlock during multiple simulaneous to program map (626 bytes, patch)
2007-10-16 09:42 UTC, Ian Kent
no flags Details | Diff
Create a separate startup conditional for each handle_mounts invocation (8.78 KB, patch)
2008-04-16 19:41 UTC, Jeff Moyer
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0354 normal SHIPPED_LIVE autofs bug fix and enhancement update 2008-05-20 12:52:25 UTC

Description Ian Kent 2007-10-16 09:42:51 UTC
Description of problem:
autofs can deadlock if multiple simulaneous access requests
are made for a program map at the same time.

Version-Release number of selected component (if applicable):

How reproducible:
Fairly frequently

Steps to Reproduce:
1. Run at least 3 instances of the autofs RHTS conectathon
test suite simultaneously in different windows.

Actual results:
The test will hang in all three windows when the deadlock

Expected results:
All three test runs should run through to completion.

Comment 1 Ian Kent 2007-10-16 09:42:51 UTC
Created attachment 228481 [details]
Prevent deadlock during multiple simulaneous to program map

Comment 2 RHEL Product and Program Management 2007-10-16 09:44:42 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update

Comment 6 Jeff Moyer 2008-04-15 18:11:08 UTC
Why does this bug refer to a deadlock in program maps, when the patch addresses
submount maps?

--- a/modules/mount_autofs.c
+++ b/modules/mount_autofs.c

Comment 7 Jeff Moyer 2008-04-15 18:12:36 UTC
This bug report is really devoid of useful information.  Ian, can you provide at
least *some* insight into the real problem being fixed?  Some stack traces or
debug logs that show the actual problem would be helpful.

Comment 8 Ian Kent 2008-04-16 06:43:44 UTC
I can't reproduce this myself now, on F-8 or RHEL-5 with
revision 0.rc2.55.

The patch as well as the bug title look suspect.
The patch itself is incorrect as it doesn't add the mutex
umlock in case the following "if test" fails.

I'm tempted to revert this but let me do some more digging


Comment 9 Ian Kent 2008-04-16 08:28:36 UTC
OK, I think I understand the point of this patch now but it is
incomplete. And it relates to submounts not program mounts.

I've had some difficulty understanding pthread condition handling
and this is an example. Maybe I need to revisit how this is done
at some later stage and possibly use an alternate method.

The difficulty comes in handling the pthread condition used to pass
parameters to handle_mounts() at startup. Since the pthread_cond_wait
unlocks the condition mutex while waiting to be signaled, another
thread can lock it before the thread being started locks it to send
the the completion signal. This can cause the creator thread to be
signaled at the wrong time. The symptom we see looks like both signals
aren't seen leaving one thread waiting forever. But this shouldn't
be possible and it should only effect the execution order so I have
to assume the reason the signal isn't seen is due to corruption of
the parameters passed caused by this out of order execution. I also
believe that this corruption often results in threads mysteriously 
"going away" rather than a segv (probably why it's has caused me so 
much confusion). Taking the "mounts" mutex prior to starting the
handle_mounts() thread prevents this. The original purpose of the 
mounts mutex was to protect the list of submounts in the owning 
autofs_point struct but I can't see any conflict using it to protect
the startup condition as well.

Anyway, the upshot of all this is that since we can potentially have
several submounts happening at once this is where we see the problem.
We don't see it at startup because initial mounts are done serially
but I think we're open to this race during map re-reads and and we
haven't seen it because they aren't done frequently. Consequently,
I think the mounts mutex should be taken in both places we start the 
handle_mounts() thread prior to using the startup condition.

Have a look around and see if you concur with my analysis.


Comment 10 Ian Kent 2008-04-16 10:35:50 UTC
(In reply to comment #9)
> much confusion). Taking the "mounts" mutex prior to starting the
> handle_mounts() thread prevents this. The original purpose of the 

That should be, prior to locking the startup condition mutex.

Comment 11 Jeff Moyer 2008-04-16 19:41:30 UTC
Created attachment 302658 [details]
Create a separate startup conditional for each handle_mounts invocation

If there can be multiple handle_mounts threads starting in parallel, then each
should have its own startup conditional.  I think that makes more sense than
trying to serialize via other, unrelated locks.  I also fixed some of the lock
release ordering.

I smoke tested this patch, but really it's for discussion.

Comment 13 errata-xmlrpc 2008-05-21 14:38:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.