Bug 248126 - autofs problem with symbolic links
Summary: autofs problem with symbolic links
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Ian Kent
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On: 174821
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-13 12:26 UTC by Ludek Smid
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version: RHSA-2007-0939
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-01 13:31:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the output of test case breakme.sh, the test fails. (1.17 KB, text/plain)
2007-08-20 01:14 UTC, Zhang Kexin
no flags Details
Patch to fix wakeup order of processes when rehashing dentry (811 bytes, patch)
2007-08-20 04:45 UTC, Ian Kent
no flags Details | Diff
Patch to sync autofs4 with upstream (1.28 KB, patch)
2007-08-27 05:49 UTC, Ian Kent
no flags Details | Diff
Patch to fix issue reported during QA (572 bytes, patch)
2007-08-27 06:06 UTC, Ian Kent
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2007:0939 0 normal SHIPPED_LIVE Important: kernel security update 2008-01-07 18:58:13 UTC

Description Ludek Smid 2007-07-13 12:26:11 UTC
This bug has been copied from bug #174821 and has been proposed to be backported
to 4.5 z-stream (EUS).

Comment 2 Don Howard 2007-07-31 21:28:06 UTC
A patch addressing this issue has been included in build 2.6.9-55.0.3.EL.

Comment 4 Zhang Kexin 2007-08-17 12:47:08 UTC
have tested the bug on 2.6.9-55.EL and 2.6.9-55.0.4, the bug is produced
successfully on 2.6.9-55.0.2 and did not happen on 2.6.9-55.0.4.
so the bug should have been fixed.

Comment 5 Zhang Kexin 2007-08-20 01:10:28 UTC
I retested this bug, and found that it failed on amd64-4as.lab.boston.redhat.com
after running the test case for more than 4 hours. the output file is attached.
on the other 3 host, ppcp-4as-bos.lab.boston.redhat.com,
i386-4as-bos.lab.boston.redhat.com and ia64-4as.lab.boston.redhat.com, the test
does not fail untill now, has running for about 16 hours.

Comment 6 Zhang Kexin 2007-08-20 01:14:10 UTC
Created attachment 161833 [details]
the output of test case breakme.sh, the test fails.

Comment 7 Ian Kent 2007-08-20 03:34:10 UTC
(In reply to comment #6)
> Created an attachment (id=161833) [edit]
> the output of test case breakme.sh, the test fails.
> 

I did uncover a problem with this patch during further
testing when preparing the patch for posting upstream.
I resolved it late last week and I'm about to update
the patch for the RHEL4 and RHEL5 kernels.

I'll post it here as soon as I've cast the patch against
the RHEL4 kernel, later today.

The symptom for this problem is different to the one
produced by the original bug. You should see the test
script exit upon receiving an ENOENT (rather than hanging
indefinitely) and autofs should continue to run. The
problem is that, with more than one process on the wait
queue, occasionally the order of the woken processes
can lead to an error return from the lookup calls.

Ian


Comment 8 Ian Kent 2007-08-20 04:34:02 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Created an attachment (id=161833) [edit] [edit]
> > the output of test case breakme.sh, the test fails.
> > 
> 

Sorry, the statements below actually refer to another
bug (see bz#253231) and although it's a RHEL5 bug the
code is the same so this bug also exists in RHEL4.

However, the test I was using when I discovered this
was the same as the test used to reproduce this bug.
The problem with waiter wakeup order does also exist
here and it causes the mount callback to the daemon
to not happen and consequently ENOENT is returned.

> I did uncover a problem with this patch during further
> testing when preparing the patch for posting upstream.
> I resolved it late last week and I'm about to update
> the patch for the RHEL4 and RHEL5 kernels.
> 
> I'll post it here as soon as I've cast the patch against
> the RHEL4 kernel, later today.
> 
> The symptom for this problem is different to the one
> produced by the original bug. You should see the test
> script exit upon receiving an ENOENT (rather than hanging
> indefinitely) and autofs should continue to run. The
> problem is that, with more than one process on the wait
> queue, occasionally the order of the woken processes
> can lead to an error return from the lookup calls.

Ian


Comment 9 Ian Kent 2007-08-20 04:45:55 UTC
Created attachment 161841 [details]
Patch to fix wakeup order of processes when rehashing dentry

Please apply and test this fix.
I think it was a coincidence that I also discovered
this when testing on x86_64, it is a potential problem
for all archs.

Ian

Comment 11 Ian Kent 2007-08-20 09:40:03 UTC
(In reply to comment #10)
> Hi Ian
> 
> Is the test failure that QA uncovered a result of bz246530?  If so, would it be
> reasonable to proceed with the patch for this issue as is and address bz246530
> in the next async update? 

Oddly enough I thought it was when I discovered it but
in fact it's an error with patch for the issue here, the
mount expire race.

Ian


Comment 14 Don Howard 2007-08-22 17:13:41 UTC
From: Zhang Kexin <kzhang>
Subject: Re: [Fwd: kernel 4.5.z test build]

Hi Don, Martin,

I have run the test for bug248126 on all six architecture except
IA64(kernel fro ia64 can not be installed), the bug is not reproduced.
on ppciseries, the test has been running more than 5 hours, the other
hosts have run the test more than 8 hours.

thanks,
Kexin


Comment 17 Ian Kent 2007-08-27 05:49:31 UTC
Created attachment 173241 [details]
Patch to sync autofs4 with upstream

There is a risk that of some confusion regarding various
patches. In order to be able to use the same patches
everywhere we need to sync the source with the various
kernels with upstream.

This patch brings the RHEL 4 kernel in line with upstream.

Comment 18 Ian Kent 2007-08-27 06:06:17 UTC
Created attachment 173261 [details]
Patch to fix issue reported during QA

This patch fixes a fail reported during QA testing.

It is in fact a hunk from another autofs4 patch that
resolves a deadlock during directory creation under load
(see bug #246530 for info). The deadlock patch delays hashing
of dentrys at directory creation until the actual create
operation and so dentrys remain unhashed for a relatively
long time so the code in this patch was needed their. With
the expire/mount race fix here, dentrys are unhashed for a
relatively brief time so the code in this patch was not
identified as needed during development. However, if there
are many process concurrently accessing directories it's
possible there will be two or more waiters in the queue.
Only one of the waiter will have the dentry required to
complete the lookup and the others need to perform a
d_lookup to get the correct dentry.

This patch allows these processes to perform the needed
d_lookup.

Ian

Comment 20 Ian Kent 2007-09-14 03:31:09 UTC
(In reply to comment #19)
> Hi Ian -
> 
> Does rhel4.5 need lookup-expire-race patch?  I noticed that it's attached to bz
> 174821, but not here.

Yes, definitely.
Would you like me to post it to verify it against the kernel
we're using here and post it here?

Ian


Comment 21 Don Howard 2007-09-17 17:44:45 UTC
I don't think there is need to repost the patches.  Thanks.

Comment 24 Don Howard 2007-09-21 22:37:32 UTC
A patch for this issue has been included in build 2.6.9-55.0.7.

Comment 28 errata-xmlrpc 2007-11-01 13:31:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0939.html



Note You need to log in before you can comment on or make changes to this bug.