185316 – VFS: brelse: Trying to free free buffer

Bug 185316 - VFS: brelse: Trying to free free buffer

Summary: VFS: brelse: Trying to free free buffer

Keywords:
Status:	CLOSED DUPLICATE of bug 168301
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	ia64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	198694
TreeView+	depends on / blocked

Reported:	2006-03-13 17:33 UTC by Doug Chapman
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-09-01 17:25:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
various unique stack traces seen (27.05 KB, text/plain) 2006-03-13 17:33 UTC, Doug Chapman	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0304	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5	2007-04-28 18:58:50 UTC

Description Doug Chapman 2006-03-13 17:33:00 UTC

Description of problem:

This problem was seen on a 64CPU ia64 system running the HP proprietary "hazard"
test suite.  The storage was ~80 72GB LUNs on MSA1000 arrays spread accros 8
qlogic FC controllers.  The default qlogic driver shipped with RHEL was used.

No data corruption or other issues causing the test to fail were seen but the
"Badness" debug message was seen approx once every 15 minutes.  I have found 14
unique stack traces.  All give the same error but took different paths to get
there.  I will attach a text file containing all of the unique traces.

Version-Release number of selected component (if applicable):

kernel-2.6.9-34.EL

How reproducible:

I ran this for a 48 hour test and saw the message regularly.  The storage I am
using is currently borrowed so I may not be able to recreate the configuration
needed to reproduce.  If this is needed please inform me and I will see if I can
borrow storage again.


Steps to Reproduce:
1. obtain a 64cpu system
2. obtain the HP proprietary "hazard" test sutie
3. obtain a TON of storage
4. run hazard with the -c3 option (filesystem only)
  
Actual results:


Expected results:


Additional info:

Comment 1 Doug Chapman 2006-03-13 17:33:00 UTC

Created attachment 126054 [details]
various unique stack traces seen

Comment 2 Doug Chapman 2006-04-11 16:32:37 UTC

FYI,

I am now able to reproduce this on a much smaller system.  I have a 4cpu ia64
system in my private rack in the Red Hat lab connected to a single MSA1000.  I
am able to hit these stack traces (although not nearly as often as on the 64 cpu
with 8 MSA1000's).

Comment 3 Doug Chapman 2006-06-26 18:32:16 UTC

I filed this quite some time back when I was the only one seeing it however we
are now seeing this more often in other testing inside HP.  It is no longer just
seen on massive systems like the one I originally reported it on so I am
increasing the severity.  It has been reported to be easily reproduced on a 2
socket dual core ia64 system.

Here is a stacktrace as seeon on RHEL4 U4 partner beta:

VFS: brelse: Trying to free free buffer
Badness in __brelse at fs/buffer.c:1372

Call Trace:
 [<a000000100016da0>] show_stack+0x80/0xa0
                                sp=e00000003d997940 bsp=e00000003d991058
 [<a000000100016df0>] dump_stack+0x30/0x60
                                sp=e00000003d997b10 bsp=e00000003d991040
 [<a000000100129990>] __brelse+0xd0/0x100
                                sp=e00000003d997b10 bsp=e00000003d991020
 [<a0000002001de770>] __try_to_free_cp_buf+0x1b0/0x220 [jbd]
                                sp=e00000003d997b10 bsp=e00000003d990ff0
 [<a0000002001de930>] __journal_clean_checkpoint_list+0x150/0x180 [jbd]
                                sp=e00000003d997b10 bsp=e00000003d990f98
 [<a0000002001d9090>] journal_commit_transaction+0x6d0/0x3080 [jbd]
                                sp=e00000003d997b10 bsp=e00000003d990ea0
 [<a0000002001e18d0>] kjournald+0x170/0x580 [jbd]
                                sp=e00000003d997d80 bsp=e00000003d990e38
 [<a000000100018c70>] kernel_thread_helper+0x30/0x60
                                sp=e00000003d997e30 bsp=e00000003d990e10
 [<a000000100008c60>] start_kernel_thread+0x20/0x40
                                sp=e00000003d997e30 bsp=e00000003d990e1

Comment 4 RHEL Program Management 2006-08-16 20:32:12 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this enhancement by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This enhancement is not yet committed for inclusion in an Update
release.

Comment 5 Alexander Viro 2006-08-22 13:22:48 UTC

Er... the obvious question: can you reproduce it with different
controller?  I.e. is that a memory corruptor in SCSI that happens
to hit buffer cache under that specific load or is that a bug
in fs/buffer.c and/or VM and/or fs code?  Is it dependent on the
fs type, while we are at it?

Comment 6 Doug Chapman 2006-08-22 15:09:39 UTC

Alexander,

The one common card in all of the systems we have seen this on is a qlogic 4GB
fibre chanel card.  I have asked people back in HP to see if they can reproduce
this with other cards.

Did you intend to remove the issue tracker link when you updated this BZ?  You
removed IT 96777 on your last update.

Comment 10 Jason Baron 2006-09-01 17:25:24 UTC

looks like a duplicate of 168301

*** This bug has been marked as a duplicate of 168301 ***

Comment 13 Jay Turner 2007-02-08 13:23:04 UTC

Pulling in ack from 168301.

Note You need to log in before you can comment on or make changes to this bug.