Bug 590762

Summary: Load average spikes for 15-30 sec making system unresponsive.
Product: Red Hat Enterprise Linux 4 Reporter: Ben Turner <bturner>
Component: autofs5Assignee: Ian Kent <ikent>
Status: CLOSED WONTFIX QA Contact: Filesystem QE <fs-qe>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: jmoyer, jwest, kzhang, rwheeler, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-25 21:05:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Failed attempt to backport some fixes from RHEL-5 none

Description Ben Turner 2010-05-10 16:05:06 UTC
Description of problem:  LSI is running a 1 off kernel we support for them which is experiencing load spikes.  The engineer who worked on the original issue(Ian Kent) evaluated the latest data and doen't think its related to his patch, but we can't rule it out. 

Version-Release number of selected component (if applicable):

2.6.9-89.0.20.EL.bz501565.11largesmp #1 SMP Tue Mar 23 04:20:50 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

This happens randomly about 1-2 times per day.

Steps to Reproduce:
1.  Normal operation
  
Actual results:

Load average spikes to very high levels.

Expected results:

Normal operation.

Additional info:

The customer is running a 1 off kernel.

Comment 11 Fabio Olive Leite 2010-07-20 15:50:43 UTC
Turning this into a proper autofs5 bug.

Maybe we can backport the negative cache fixes from RHEL-5 and ensure it performs better with a multitude of non-existing lookups.

Comment 12 Fabio Olive Leite 2010-07-20 15:53:59 UTC
Created attachment 433201 [details]
Failed attempt to backport some fixes from RHEL-5

This patch was an attempt to backport the RHEL-5 fixes found in these bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=469387
https://bugzilla.redhat.com/show_bug.cgi?id=593378

It seems the situation did not improve at all, though, so I'll have to defer to Ian for a more informed opinion. :)

Comment 15 Issue Tracker 2010-08-13 15:49:15 UTC
Event posted on 08-13-2010 11:49am EDT by bturner

Any update in backporting the RHEL 5 patches to RHEL 4?  The customer is
getting antsy for something.  Any idea when you will have a chance to get
those changes backported?

-Ben


This event sent from IssueTracker by bturner 
 issue 863873

Comment 16 Ian Kent 2010-08-16 01:33:04 UTC
(In reply to comment #15)
> Event posted on 08-13-2010 11:49am EDT by bturner
> 
> Any update in backporting the RHEL 5 patches to RHEL 4?  The customer is
> getting antsy for something.  Any idea when you will have a chance to get
> those changes backported?

Right, I need to get back to this, I'll spend time on it
tomorrow.

But, mostly this isn't a backport, it's is a new change.

Comment 17 Ian Kent 2010-08-17 03:22:23 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > Event posted on 08-13-2010 11:49am EDT by bturner
> > 
> > Any update in backporting the RHEL 5 patches to RHEL 4?  The customer is
> > getting antsy for something.  Any idea when you will have a chance to get
> > those changes backported?
> 
> Right, I need to get back to this, I'll spend time on it
> tomorrow.
> 
> But, mostly this isn't a backport, it's is a new change.

While I'm working on the change to catch negative cached lookups
earlier it would be a good idea to try the current CVS package.

This should fix non-existing keys not being added to the negative
cache and report the negative lookup only when the key is first
added.

This won't help whatever is causing the high load average but it
should quiet the log and fix the negative cache problem.

This rpm can be found at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.112

Comment 18 Ian Kent 2010-08-20 11:56:39 UTC
I've built a test package that checks if a lookup
negatively cached very early in the lookup.

Not sure how it will go and I haven't actually tested
it but it should be OK.

Please give it a try, it can be found at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.112.bz590762.1

Comment 19 Ian Kent 2010-08-24 13:43:21 UTC
(In reply to comment #18)
> I've built a test package that checks if a lookup
> negatively cached very early in the lookup.
> 
> Not sure how it will go and I haven't actually tested
> it but it should be OK.
> 
> Please give it a try, it can be found at:
> http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.112.bz590762.1

Oops, stupid mistake, please try:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.112.bz590762.2

Comment 22 Ian Kent 2011-01-10 07:29:12 UTC
A problem has been reported against the latest test package we are
using for this bug. Having seen this problem before I have added a
patch that may address the issue however we have not had sufficient
feedback to know that the correction is sufficient.

So please give it a try and let us know how it goes.
The new package can be found at:
http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.112.bz590762.4

Comment 25 RHEL Program Management 2011-01-25 21:05:25 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.