Bug 333781

Summary:

autofs can deadlock with multiple simulaneous access to a submount map

Product:

Red Hat Enterprise Linux 5

Reporter:

Ian Kent <ikent>

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED ERRATA

QA Contact:

Brock Organ <borgan>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

5.0

CC:

ikent, jmoyer

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2008-0354

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-05-21 14:38:11 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

425903

Attachments:

Description	Flags
Prevent deadlock during multiple simulaneous to program map	none
Create a separate startup conditional for each handle_mounts invocation	none

Description Ian Kent 2007-10-16 09:42:51 UTC

Description of problem:
autofs can deadlock if multiple simulaneous access requests
are made for a program map at the same time.

Version-Release number of selected component (if applicable):
autofs-5.0.1-0.rc2.55

How reproducible:
Fairly frequently

Steps to Reproduce:
1. Run at least 3 instances of the autofs RHTS conectathon
test suite simultaneously in different windows.

Actual results:
The test will hang in all three windows when the deadlock
occurs.

Expected results:
All three test runs should run through to completion.

Comment 1 Ian Kent 2007-10-16 09:42:51 UTC

Created attachment 228481 [details]
Prevent deadlock during multiple simulaneous to program map

Comment 2 RHEL Program Management 2007-10-16 09:44:42 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jeff Moyer 2008-04-15 18:11:08 UTC

Why does this bug refer to a deadlock in program maps, when the patch addresses
submount maps?

--- a/modules/mount_autofs.c
+++ b/modules/mount_autofs.c

Comment 7 Jeff Moyer 2008-04-15 18:12:36 UTC

This bug report is really devoid of useful information.  Ian, can you provide at
least *some* insight into the real problem being fixed?  Some stack traces or
debug logs that show the actual problem would be helpful.

Comment 8 Ian Kent 2008-04-16 06:43:44 UTC

I can't reproduce this myself now, on F-8 or RHEL-5 with
revision 0.rc2.55.

The patch as well as the bug title look suspect.
The patch itself is incorrect as it doesn't add the mutex
umlock in case the following "if test" fails.

I'm tempted to revert this but let me do some more digging
first.

Ian

Comment 9 Ian Kent 2008-04-16 08:28:36 UTC

OK, I think I understand the point of this patch now but it is
incomplete. And it relates to submounts not program mounts.

I've had some difficulty understanding pthread condition handling
and this is an example. Maybe I need to revisit how this is done
at some later stage and possibly use an alternate method.

The difficulty comes in handling the pthread condition used to pass
parameters to handle_mounts() at startup. Since the pthread_cond_wait
unlocks the condition mutex while waiting to be signaled, another
thread can lock it before the thread being started locks it to send
the the completion signal. This can cause the creator thread to be
signaled at the wrong time. The symptom we see looks like both signals
aren't seen leaving one thread waiting forever. But this shouldn't
be possible and it should only effect the execution order so I have
to assume the reason the signal isn't seen is due to corruption of
the parameters passed caused by this out of order execution. I also
believe that this corruption often results in threads mysteriously 
"going away" rather than a segv (probably why it's has caused me so 
much confusion). Taking the "mounts" mutex prior to starting the
handle_mounts() thread prevents this. The original purpose of the 
mounts mutex was to protect the list of submounts in the owning 
autofs_point struct but I can't see any conflict using it to protect
the startup condition as well.

Anyway, the upshot of all this is that since we can potentially have
several submounts happening at once this is where we see the problem.
We don't see it at startup because initial mounts are done serially
but I think we're open to this race during map re-reads and and we
haven't seen it because they aren't done frequently. Consequently,
I think the mounts mutex should be taken in both places we start the 
handle_mounts() thread prior to using the startup condition.

Have a look around and see if you concur with my analysis.

Ian

Comment 10 Ian Kent 2008-04-16 10:35:50 UTC

(In reply to comment #9)
> much confusion). Taking the "mounts" mutex prior to starting the
> handle_mounts() thread prevents this. The original purpose of the 

That should be, prior to locking the startup condition mutex.

Comment 11 Jeff Moyer 2008-04-16 19:41:30 UTC

Created attachment 302658 [details]
Create a separate startup conditional for each handle_mounts invocation

If there can be multiple handle_mounts threads starting in parallel, then each
should have its own startup conditional.  I think that makes more sense than
trying to serialize via other, unrelated locks.  I also fixed some of the lock
release ordering.

I smoke tested this patch, but really it's for discussion.

Comment 13 errata-xmlrpc 2008-05-21 14:38:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0354.html