Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 185316

Summary: VFS: brelse: Trying to free free buffer
Product: Red Hat Enterprise Linux 4 Reporter: Doug Chapman <dchapman>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: aviro, jbaron, staubach
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-01 17:25:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 198694    
Attachments:
Description Flags
various unique stack traces seen none

Description Doug Chapman 2006-03-13 17:33:00 UTC
Description of problem:

This problem was seen on a 64CPU ia64 system running the HP proprietary "hazard"
test suite.  The storage was ~80 72GB LUNs on MSA1000 arrays spread accros 8
qlogic FC controllers.  The default qlogic driver shipped with RHEL was used.

No data corruption or other issues causing the test to fail were seen but the
"Badness" debug message was seen approx once every 15 minutes.  I have found 14
unique stack traces.  All give the same error but took different paths to get
there.  I will attach a text file containing all of the unique traces.

Version-Release number of selected component (if applicable):

kernel-2.6.9-34.EL

How reproducible:

I ran this for a 48 hour test and saw the message regularly.  The storage I am
using is currently borrowed so I may not be able to recreate the configuration
needed to reproduce.  If this is needed please inform me and I will see if I can
borrow storage again.


Steps to Reproduce:
1. obtain a 64cpu system
2. obtain the HP proprietary "hazard" test sutie
3. obtain a TON of storage
4. run hazard with the -c3 option (filesystem only)
  
Actual results:


Expected results:


Additional info:

Comment 1 Doug Chapman 2006-03-13 17:33:00 UTC
Created attachment 126054 [details]
various unique stack traces seen

Comment 2 Doug Chapman 2006-04-11 16:32:37 UTC
FYI,

I am now able to reproduce this on a much smaller system.  I have a 4cpu ia64
system in my private rack in the Red Hat lab connected to a single MSA1000.  I
am able to hit these stack traces (although not nearly as often as on the 64 cpu
with 8 MSA1000's).



Comment 3 Doug Chapman 2006-06-26 18:32:16 UTC
I filed this quite some time back when I was the only one seeing it however we
are now seeing this more often in other testing inside HP.  It is no longer just
seen on massive systems like the one I originally reported it on so I am
increasing the severity.  It has been reported to be easily reproduced on a 2
socket dual core ia64 system.

Here is a stacktrace as seeon on RHEL4 U4 partner beta:

VFS: brelse: Trying to free free buffer
Badness in __brelse at fs/buffer.c:1372

Call Trace:
 [<a000000100016da0>] show_stack+0x80/0xa0
                                sp=e00000003d997940 bsp=e00000003d991058
 [<a000000100016df0>] dump_stack+0x30/0x60
                                sp=e00000003d997b10 bsp=e00000003d991040
 [<a000000100129990>] __brelse+0xd0/0x100
                                sp=e00000003d997b10 bsp=e00000003d991020
 [<a0000002001de770>] __try_to_free_cp_buf+0x1b0/0x220 [jbd]
                                sp=e00000003d997b10 bsp=e00000003d990ff0
 [<a0000002001de930>] __journal_clean_checkpoint_list+0x150/0x180 [jbd]
                                sp=e00000003d997b10 bsp=e00000003d990f98
 [<a0000002001d9090>] journal_commit_transaction+0x6d0/0x3080 [jbd]
                                sp=e00000003d997b10 bsp=e00000003d990ea0
 [<a0000002001e18d0>] kjournald+0x170/0x580 [jbd]
                                sp=e00000003d997d80 bsp=e00000003d990e38
 [<a000000100018c70>] kernel_thread_helper+0x30/0x60
                                sp=e00000003d997e30 bsp=e00000003d990e10
 [<a000000100008c60>] start_kernel_thread+0x20/0x40
                                sp=e00000003d997e30 bsp=e00000003d990e1



Comment 4 RHEL Program Management 2006-08-16 20:32:12 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this enhancement by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This enhancement is not yet committed for inclusion in an Update
release.

Comment 5 Alexander Viro 2006-08-22 13:22:48 UTC
Er... the obvious question: can you reproduce it with different
controller?  I.e. is that a memory corruptor in SCSI that happens
to hit buffer cache under that specific load or is that a bug
in fs/buffer.c and/or VM and/or fs code?  Is it dependent on the
fs type, while we are at it?

Comment 6 Doug Chapman 2006-08-22 15:09:39 UTC
Alexander,

The one common card in all of the systems we have seen this on is a qlogic 4GB
fibre chanel card.  I have asked people back in HP to see if they can reproduce
this with other cards.

Did you intend to remove the issue tracker link when you updated this BZ?  You
removed IT 96777 on your last update.

Comment 10 Jason Baron 2006-09-01 17:25:24 UTC
looks like a duplicate of 168301

*** This bug has been marked as a duplicate of 168301 ***

Comment 13 Jay Turner 2007-02-08 13:23:04 UTC
Pulling in ack from 168301.