Red Hat Bugzilla – Bug 248126
autofs problem with symbolic links
Last modified: 2007-11-30 17:07:29 EST
This bug has been copied from bug #174821 and has been proposed to be backported
to 4.5 z-stream (EUS).
A patch addressing this issue has been included in build 2.6.9-55.0.3.EL.
have tested the bug on 2.6.9-55.EL and 2.6.9-55.0.4, the bug is produced
successfully on 2.6.9-55.0.2 and did not happen on 2.6.9-55.0.4.
so the bug should have been fixed.
I retested this bug, and found that it failed on amd64-4as.lab.boston.redhat.com
after running the test case for more than 4 hours. the output file is attached.
on the other 3 host, ppcp-4as-bos.lab.boston.redhat.com,
i386-4as-bos.lab.boston.redhat.com and ia64-4as.lab.boston.redhat.com, the test
does not fail untill now, has running for about 16 hours.
Created attachment 161833 [details]
the output of test case breakme.sh, the test fails.
(In reply to comment #6)
> Created an attachment (id=161833) 
> the output of test case breakme.sh, the test fails.
I did uncover a problem with this patch during further
testing when preparing the patch for posting upstream.
I resolved it late last week and I'm about to update
the patch for the RHEL4 and RHEL5 kernels.
I'll post it here as soon as I've cast the patch against
the RHEL4 kernel, later today.
The symptom for this problem is different to the one
produced by the original bug. You should see the test
script exit upon receiving an ENOENT (rather than hanging
indefinitely) and autofs should continue to run. The
problem is that, with more than one process on the wait
queue, occasionally the order of the woken processes
can lead to an error return from the lookup calls.
(In reply to comment #7)
> (In reply to comment #6)
> > Created an attachment (id=161833)  
> > the output of test case breakme.sh, the test fails.
Sorry, the statements below actually refer to another
bug (see bz#253231) and although it's a RHEL5 bug the
code is the same so this bug also exists in RHEL4.
However, the test I was using when I discovered this
was the same as the test used to reproduce this bug.
The problem with waiter wakeup order does also exist
here and it causes the mount callback to the daemon
to not happen and consequently ENOENT is returned.
> I did uncover a problem with this patch during further
> testing when preparing the patch for posting upstream.
> I resolved it late last week and I'm about to update
> the patch for the RHEL4 and RHEL5 kernels.
> I'll post it here as soon as I've cast the patch against
> the RHEL4 kernel, later today.
> The symptom for this problem is different to the one
> produced by the original bug. You should see the test
> script exit upon receiving an ENOENT (rather than hanging
> indefinitely) and autofs should continue to run. The
> problem is that, with more than one process on the wait
> queue, occasionally the order of the woken processes
> can lead to an error return from the lookup calls.
Created attachment 161841 [details]
Patch to fix wakeup order of processes when rehashing dentry
Please apply and test this fix.
I think it was a coincidence that I also discovered
this when testing on x86_64, it is a potential problem
for all archs.
(In reply to comment #10)
> Hi Ian
> Is the test failure that QA uncovered a result of bz246530? If so, would it be
> reasonable to proceed with the patch for this issue as is and address bz246530
> in the next async update?
Oddly enough I thought it was when I discovered it but
in fact it's an error with patch for the issue here, the
mount expire race.
From: Zhang Kexin <firstname.lastname@example.org>
Subject: Re: [Fwd: kernel 4.5.z test build]
Hi Don, Martin,
I have run the test for bug248126 on all six architecture except
IA64(kernel fro ia64 can not be installed), the bug is not reproduced.
on ppciseries, the test has been running more than 5 hours, the other
hosts have run the test more than 8 hours.
Created attachment 173241 [details]
Patch to sync autofs4 with upstream
There is a risk that of some confusion regarding various
patches. In order to be able to use the same patches
everywhere we need to sync the source with the various
kernels with upstream.
This patch brings the RHEL 4 kernel in line with upstream.
Created attachment 173261 [details]
Patch to fix issue reported during QA
This patch fixes a fail reported during QA testing.
It is in fact a hunk from another autofs4 patch that
resolves a deadlock during directory creation under load
(see bug #246530 for info). The deadlock patch delays hashing
of dentrys at directory creation until the actual create
operation and so dentrys remain unhashed for a relatively
long time so the code in this patch was needed their. With
the expire/mount race fix here, dentrys are unhashed for a
relatively brief time so the code in this patch was not
identified as needed during development. However, if there
are many process concurrently accessing directories it's
possible there will be two or more waiters in the queue.
Only one of the waiter will have the dentry required to
complete the lookup and the others need to perform a
d_lookup to get the correct dentry.
This patch allows these processes to perform the needed
(In reply to comment #19)
> Hi Ian -
> Does rhel4.5 need lookup-expire-race patch? I noticed that it's attached to bz
> 174821, but not here.
Would you like me to post it to verify it against the kernel
we're using here and post it here?
I don't think there is need to repost the patches. Thanks.
A patch for this issue has been included in build 2.6.9-55.0.7.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.