Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 514412

Summary: RHEL 5.5 RFE - autofs add configuration option to increase max open files and max stack size
Product: Red Hat Enterprise Linux 5 Reporter: Lachlan McIlroy <lmcilroy>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.4CC: dkovalsk, fhirtz, ikent, jbrier, jmoyer, jnansi, jplans, plyons, syeghiay, tao, vgaikwad
Target Milestone: rcKeywords: FutureFeature, ZStream
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.0.1-0.rc2.132.el5 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:36:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 513289    
Bug Blocks: 499522, 525431    
Attachments:
Description Flags
Patch to add back ommitted locking on direct map re-read
none
Patch to fix recent "dont umount existing direct mount on reread" change
none
Patch to fix libxml2 thread safety issue none

Description Lachlan McIlroy 2009-07-29 03:21:39 UTC
Description of problem:
Customer has a very large number of direct maps. They ran into an issue where "automounter WILL cease functioning after about 24-72 hours." To work around the issue they added 'ulimit -n 20480; ulimit -s 65535' to /etc/init.d/autofs. Ian Kent has a patch for autofs to add a configuration option to increase max open files and max stack size. This way the customer doesn't have to hack /etc/init.d/autofs

Comment 1 Ian Kent 2009-07-29 08:26:41 UTC
(In reply to comment #0)
> Description of problem:
> Customer has a very large number of direct maps. They ran into an issue where
> "automounter WILL cease functioning after about 24-72 hours." To work around
> the issue they added 'ulimit -n 20480; ulimit -s 65535' to /etc/init.d/autofs.
> Ian Kent has a patch for autofs to add a configuration option to increase max
> open files and max stack size. This way the customer doesn't have to hack
> /etc/init.d/autofs  

Right, but there are a couple of unanswered questions at this
stage.

First, assuming we do need to add these configuration options,
there is the question of their initial values. The values above
are quite large and most people won't come close to needing them
set that high. Is the customer happy to update their autofs
configuration with the values they need to use?

Second, I don't think the "ulimit -n 20480" does anything because
autofs explicitly sets the maximum open files when it starts. We
need to know what happens when that setting isn't used? Of course,
that doesn't mean we then wouldn't add it as a configuration
option but there would be no reason to increase it beyond the
current 10240 that we use.

Finally, there is the question of whether the problem is being
caused by another issue. Recently an upstream change caused some
out of bounds (by one) array references to a stack variable in
several functions in the LDAP lookup module to come to light.
RHEL autofs doesn't have this change but those illegal references
are present, and even although we haven't had any other reports
of a problem due to this, it may be the cause here.

So, I recommend I fix the array reference issue and provide a
scratch build for testing before we go any further.

Is that OK?

Ian

Comment 2 Issue Tracker 2009-07-29 13:19:02 UTC
Event posted on 07-29-2009 09:19am EDT by jbrier

>So, I recommend I fix the array reference issue and provide a
>scratch build for testing before we go any further.

>Is that OK?

In the email correspondence between Frank Hirtz, Ian Kent, et al I think
that was determined to be the best thing to try, *first*

Please do that Ian and I will pass the test packages along to the
customer. 

Just for clarification, the Issue-Tracker was originally opened as an RFE
but I meant this to go to Engineering with the expectation that we would
try to fix the array reference issue first, as a potential Bug, NOT an
RFE.  Jeremy West reassigned the escalation group as such. The title of
the IT/BZ is probably not currently accurate. 

John Brier


This event sent from IssueTracker by jbrier 
 issue 322595

Comment 3 Ian Kent 2009-07-29 14:06:40 UTC
(In reply to comment #2)
> Event posted on 07-29-2009 09:19am EDT by jbrier
> 
> >So, I recommend I fix the array reference issue and provide a
> >scratch build for testing before we go any further.
> 
> >Is that OK?
> 
> In the email correspondence between Frank Hirtz, Ian Kent, et al I think
> that was determined to be the best thing to try, *first*
> 
> Please do that Ian and I will pass the test packages along to the
> customer. 

Done, the package with the array reference fix can be found at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.130.bz514412.1

> 
> Just for clarification, the Issue-Tracker was originally opened as an RFE
> but I meant this to go to Engineering with the expectation that we would
> try to fix the array reference issue first, as a potential Bug, NOT an
> RFE.  Jeremy West reassigned the escalation group as such. The title of
> the IT/BZ is probably not currently accurate. 

It matters not, both changes are straight forward.
We just needed a bug so we could track the work.

Ian

Comment 11 Ian Kent 2009-08-03 16:51:05 UTC
I'll have a look at the strace but I don't think it will
be useful.

When autofs hangs then we need some specific information.
In particular a gdb backtrace of the threads in automount.

The corresponding debuginfo package needs to be installed
for the backtrace to be useful. The debuginfo packages are
also available on my people page for the packages being used.

When autofs hangs do:
gdb -p <automount pid> /usr/sbin/automount
gdb> thr a a bt

and save the output and post it to the bug.
A copy of /proc/mounts would also be useful.

Also, if using the RHEL-5.4 kernel, could we use two different
setups for this, one with the autofs configuration
USE_MISC_DEVICE="yes" and the other with USE_MISC_DEVICE="no"
or commented out. If we aren't using the 5.4 kernel which
revision is in use?

Ian

Comment 27 Ian Kent 2009-08-25 05:17:05 UTC
Created attachment 358519 [details]
Patch to add back ommitted locking on direct map re-read

Comment 30 Ian Kent 2009-08-27 06:04:48 UTC
Created attachment 358810 [details]
Patch to fix recent "dont umount existing direct mount on reread" change

This latest debug and trace information lead me to look again
at the area of code that needed the locking fix above. It
certainly looks like there is an incorrect check in another
recent fix which is causing a deadlock. Have a look at the
description in the patch for a more detailed explanation.

Comment 31 Ian Kent 2009-08-27 06:13:06 UTC
Created attachment 358812 [details]
Patch to fix libxml2 thread safety issue

I see from the debug log your using LDAP (yeah, I knew that
anyway). Recent changes seem to have altered the concurrency
behaviour of autofs a little which has caused a issue with
libxml2 to show up. We have just received feedback from a
customer that tested this patch advising us it fixes the
issue for them so I've included in the update here as well.

Comment 32 Ian Kent 2009-08-27 06:27:24 UTC
I can't be sure this revision fixes the hang we are seeing but
the evidence appears to match. It looks like a deadlock has
been introduced a recent bug fix which makes sense because the
really significant changes that went into 5.4 has been tested
extensively.

Could you please test revision 0.rc2.130.bz514412.3.

It can be found at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.130.bz514412.3.

Ian

Comment 37 Ian Kent 2009-09-10 10:03:36 UTC
I've marked this bug as dependent on bug 513289 because the
package tested here included that correction also.

The changes for this bug and 513289 are relatively small so
are low risk and have been verified by customers.

I believe I will be able to write an RHTS regression test
for the issue in this bug and will postpone marking the two
bugs here as MODIFIED util that task has been done. However,
the changes have been committed to CVS and autofs built as
revision 0.rc2.132.

We should have this build tested again by the customers
concerned for completeness (while the test is written).

Ian

Comment 38 Ian Kent 2009-09-11 07:08:47 UTC
(In reply to comment #37)
> 
> I believe I will be able to write an RHTS regression test
> for the issue in this bug and will postpone marking the two
> bugs here as MODIFIED util that task has been done. However,
> the changes have been committed to CVS and autofs built as
> revision 0.rc2.132.

The RHTS test bugzillas/bz493791 has been updated to check
for the regression identified and resolved in this bug.

Setting bug status to MODIFIED.

Ian

Comment 39 Ian Kent 2009-09-11 07:21:48 UTC
(In reply to comment #38)
> (In reply to comment #37)
> 
> The RHTS test bugzillas/bz493791 has been updated to check
> for the regression identified and resolved in this bug.

It turns out that this regression is triggered by requesting a map
re-load where an entry in a direct map has been removed and not a
modification, as was initially thought.

Ian

Comment 49 errata-xmlrpc 2010-03-30 08:36:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0265.html