Bug 514412 - RHEL 5.5 RFE - autofs add configuration option to increase max open files and max stack size
Summary: RHEL 5.5 RFE - autofs add configuration option to increase max open files and...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs
Version: 5.4
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: 5.5
Assignee: Ian Kent
QA Contact: BaseOS QE
URL:
Whiteboard:
Depends On: 513289
Blocks: 499522 525431
TreeView+ depends on / blocked
 
Reported: 2009-07-29 03:21 UTC by Lachlan McIlroy
Modified: 2018-10-27 15:56 UTC (History)
11 users (show)

Fixed In Version: autofs-5.0.1-0.rc2.132.el5
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:36:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to add back ommitted locking on direct map re-read (1.16 KB, patch)
2009-08-25 05:17 UTC, Ian Kent
no flags Details | Diff
Patch to fix recent "dont umount existing direct mount on reread" change (1.47 KB, patch)
2009-08-27 06:04 UTC, Ian Kent
no flags Details | Diff
Patch to fix libxml2 thread safety issue (1.73 KB, patch)
2009-08-27 06:13 UTC, Ian Kent
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0265 0 normal SHIPPED_LIVE autofs bug fix update 2010-03-29 12:54:19 UTC

Description Lachlan McIlroy 2009-07-29 03:21:39 UTC
Description of problem:
Customer has a very large number of direct maps. They ran into an issue where "automounter WILL cease functioning after about 24-72 hours." To work around the issue they added 'ulimit -n 20480; ulimit -s 65535' to /etc/init.d/autofs. Ian Kent has a patch for autofs to add a configuration option to increase max open files and max stack size. This way the customer doesn't have to hack /etc/init.d/autofs

Comment 1 Ian Kent 2009-07-29 08:26:41 UTC
(In reply to comment #0)
> Description of problem:
> Customer has a very large number of direct maps. They ran into an issue where
> "automounter WILL cease functioning after about 24-72 hours." To work around
> the issue they added 'ulimit -n 20480; ulimit -s 65535' to /etc/init.d/autofs.
> Ian Kent has a patch for autofs to add a configuration option to increase max
> open files and max stack size. This way the customer doesn't have to hack
> /etc/init.d/autofs  

Right, but there are a couple of unanswered questions at this
stage.

First, assuming we do need to add these configuration options,
there is the question of their initial values. The values above
are quite large and most people won't come close to needing them
set that high. Is the customer happy to update their autofs
configuration with the values they need to use?

Second, I don't think the "ulimit -n 20480" does anything because
autofs explicitly sets the maximum open files when it starts. We
need to know what happens when that setting isn't used? Of course,
that doesn't mean we then wouldn't add it as a configuration
option but there would be no reason to increase it beyond the
current 10240 that we use.

Finally, there is the question of whether the problem is being
caused by another issue. Recently an upstream change caused some
out of bounds (by one) array references to a stack variable in
several functions in the LDAP lookup module to come to light.
RHEL autofs doesn't have this change but those illegal references
are present, and even although we haven't had any other reports
of a problem due to this, it may be the cause here.

So, I recommend I fix the array reference issue and provide a
scratch build for testing before we go any further.

Is that OK?

Ian

Comment 2 Issue Tracker 2009-07-29 13:19:02 UTC
Event posted on 07-29-2009 09:19am EDT by jbrier

>So, I recommend I fix the array reference issue and provide a
>scratch build for testing before we go any further.

>Is that OK?

In the email correspondence between Frank Hirtz, Ian Kent, et al I think
that was determined to be the best thing to try, *first*

Please do that Ian and I will pass the test packages along to the
customer. 

Just for clarification, the Issue-Tracker was originally opened as an RFE
but I meant this to go to Engineering with the expectation that we would
try to fix the array reference issue first, as a potential Bug, NOT an
RFE.  Jeremy West reassigned the escalation group as such. The title of
the IT/BZ is probably not currently accurate. 

John Brier


This event sent from IssueTracker by jbrier 
 issue 322595

Comment 3 Ian Kent 2009-07-29 14:06:40 UTC
(In reply to comment #2)
> Event posted on 07-29-2009 09:19am EDT by jbrier
> 
> >So, I recommend I fix the array reference issue and provide a
> >scratch build for testing before we go any further.
> 
> >Is that OK?
> 
> In the email correspondence between Frank Hirtz, Ian Kent, et al I think
> that was determined to be the best thing to try, *first*
> 
> Please do that Ian and I will pass the test packages along to the
> customer. 

Done, the package with the array reference fix can be found at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.130.bz514412.1

> 
> Just for clarification, the Issue-Tracker was originally opened as an RFE
> but I meant this to go to Engineering with the expectation that we would
> try to fix the array reference issue first, as a potential Bug, NOT an
> RFE.  Jeremy West reassigned the escalation group as such. The title of
> the IT/BZ is probably not currently accurate. 

It matters not, both changes are straight forward.
We just needed a bug so we could track the work.

Ian

Comment 11 Ian Kent 2009-08-03 16:51:05 UTC
I'll have a look at the strace but I don't think it will
be useful.

When autofs hangs then we need some specific information.
In particular a gdb backtrace of the threads in automount.

The corresponding debuginfo package needs to be installed
for the backtrace to be useful. The debuginfo packages are
also available on my people page for the packages being used.

When autofs hangs do:
gdb -p <automount pid> /usr/sbin/automount
gdb> thr a a bt

and save the output and post it to the bug.
A copy of /proc/mounts would also be useful.

Also, if using the RHEL-5.4 kernel, could we use two different
setups for this, one with the autofs configuration
USE_MISC_DEVICE="yes" and the other with USE_MISC_DEVICE="no"
or commented out. If we aren't using the 5.4 kernel which
revision is in use?

Ian

Comment 27 Ian Kent 2009-08-25 05:17:05 UTC
Created attachment 358519 [details]
Patch to add back ommitted locking on direct map re-read

Comment 30 Ian Kent 2009-08-27 06:04:48 UTC
Created attachment 358810 [details]
Patch to fix recent "dont umount existing direct mount on reread" change

This latest debug and trace information lead me to look again
at the area of code that needed the locking fix above. It
certainly looks like there is an incorrect check in another
recent fix which is causing a deadlock. Have a look at the
description in the patch for a more detailed explanation.

Comment 31 Ian Kent 2009-08-27 06:13:06 UTC
Created attachment 358812 [details]
Patch to fix libxml2 thread safety issue

I see from the debug log your using LDAP (yeah, I knew that
anyway). Recent changes seem to have altered the concurrency
behaviour of autofs a little which has caused a issue with
libxml2 to show up. We have just received feedback from a
customer that tested this patch advising us it fixes the
issue for them so I've included in the update here as well.

Comment 32 Ian Kent 2009-08-27 06:27:24 UTC
I can't be sure this revision fixes the hang we are seeing but
the evidence appears to match. It looks like a deadlock has
been introduced a recent bug fix which makes sense because the
really significant changes that went into 5.4 has been tested
extensively.

Could you please test revision 0.rc2.130.bz514412.3.

It can be found at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.130.bz514412.3.

Ian

Comment 37 Ian Kent 2009-09-10 10:03:36 UTC
I've marked this bug as dependent on bug 513289 because the
package tested here included that correction also.

The changes for this bug and 513289 are relatively small so
are low risk and have been verified by customers.

I believe I will be able to write an RHTS regression test
for the issue in this bug and will postpone marking the two
bugs here as MODIFIED util that task has been done. However,
the changes have been committed to CVS and autofs built as
revision 0.rc2.132.

We should have this build tested again by the customers
concerned for completeness (while the test is written).

Ian

Comment 38 Ian Kent 2009-09-11 07:08:47 UTC
(In reply to comment #37)
> 
> I believe I will be able to write an RHTS regression test
> for the issue in this bug and will postpone marking the two
> bugs here as MODIFIED util that task has been done. However,
> the changes have been committed to CVS and autofs built as
> revision 0.rc2.132.

The RHTS test bugzillas/bz493791 has been updated to check
for the regression identified and resolved in this bug.

Setting bug status to MODIFIED.

Ian

Comment 39 Ian Kent 2009-09-11 07:21:48 UTC
(In reply to comment #38)
> (In reply to comment #37)
> 
> The RHTS test bugzillas/bz493791 has been updated to check
> for the regression identified and resolved in this bug.

It turns out that this regression is triggered by requesting a map
re-load where an entry in a direct map has been removed and not a
modification, as was initially thought.

Ian

Comment 49 errata-xmlrpc 2010-03-30 08:36:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0265.html


Note You need to log in before you can comment on or make changes to this bug.