Bug 60532

Summary: kernel BUG at slab.c:1767!
Product: Red Hat Enterprise Linux 2.1 Reporter: Matthew Goheen <mgoheen>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: james.bottomley, shillman
Target Milestone: ---   
Target Release: ---   
Hardware: i586   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-04 15:15:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Goheen 2002-03-01 05:22:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; Q312461)

Description of problem:
For at least the last two RH kernel releases I've been having
consistant kernel bugs while performing backups (using dump).
I thought this was related to using an Exabyte tape device, but
today it happened doing a dump from my IDE drive to my software RAID
partition.  Here are the kernel messages:

kernel BUG at slab.c:1767!
invalid operand: 0000
Kernel 2.4.9-31
CPU:    0
EIP:    0010:[kmem_cache_reap+216/576]    Not tainted
EIP:    0010:[<c012b5c8>]    Not tainted
EFLAGS: 00010082
EIP is at kmem_cache_reap [kernel] 0xd8
eax: 0000001b   ebx: c5f70000   ecx: c02af5c4   edx: 000027c3
esi: c14c0e30   edi: c14c0e40   ebp: 0000009a   esp: c14cbfa0
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 5, stackpage=c14cb000)
Stack: c022ce5c 000006e7 00000001 00000002 00000002 c14c0ea0 0000024a 000000c0
       000000c0 008e000 c012d636 000000c0 c14ca000 00000006 c012d695 000000c0
       00000000 00010f00 c14c3fb8 c0105000 c0105726 00000000 c012d640 c02c7fd8
Call Trace: [IRQ0x0f_interrupt+109868/135280] .rodata.str1.1 [kernel] 0x1f57
Call Trace: [<c022ce5c>] .rodata.str1.1 [kernel]0x1f57
[do_try_to_free_pages+70/80] do_try_to_free_pages [kernel] 0x46
[<c012d636>] do_try_to_free_pages [kernel] 0x46
[kswapd+85/240] kswapd [kernel] 0x55
[<c012d695>] kswapd [kernel] 0x55
[_stext+0/48] stext [kernel] 0x0
[<c0105000>] stext [kernel] 0x0
[kernel_thread+38/48] kernel_thread [kernel] 0x26
[<c0105726>] kernel_thread [kernel] 0x26
[kswapd+0/240] kswapd [kernel] 0x0
[<c012d640>] kswapd [kernel] 0x0

Code: 0f 0b 58 5a 8b 03 45 39 f8 75 dd 8b 4e 28 89 ea 8b 5e 48 d3

At this point, the system could still operate, but the kswapd was
hung.

Here are the modules I have loaded:

Module                  Size  Used by    Not tainted
3c59x                  26120   1  (autoclean)
ipchains               38952   0
st                     26484   0
raid5                  17792   1
xor                     6360   0  [raid5]
sym53c8xx              57540   4
sd_mod                 12028   4
scsi_mod               97048   3  [st sym53c8xx sd_mod]

This is an AMD K6 CPU (350MHz) with 128Mb of RAM.  This problem
appeared to occur more often when I ran with just 32Mb, but it
hasn't gone away.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. /sbin/dump 0fu - /usr | gzip > /home/backup/usr.dump.gz

[root@backup /root]# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda1               256667    188851     54564  78% /
/dev/hda6              3612272   1010832   2417900  30% /usr
/dev/md0               8782856   3858260   4478440  47% /home/backup
[root@backup /root]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      8923136 blocks level 5, 32k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>


Actual Results:  kswapd died (rarely, see trace above)
with previous kernels, system would sometimes hang and/or crash (hasn't 
happened yet)

Additional info:

Comment 1 James Bottomley 2002-12-20 15:50:49 UTC
We just hit this on RedHat AS 2.1 with kernel 2.4.9-e.3

Under similar circumstances (doing filesystem stress testing with tar load on an
ext3 filesystem over ips (serveRAID)).

For RHAS, it's at slab.c:1769, but it's an identical problem (kswapd trying to
reap a slab page with occupied entries).

The reproduceability is small for us (about once every ten stress tests).  It
has also only shown up with ext3 so far.

Comment 2 Arjan van de Ven 2002-12-20 15:54:24 UTC
is this fixed in the most recent 7.1 kernel erratum ?

Comment 3 James Bottomley 2002-12-20 16:06:53 UTC
I've got to say obviously, yes for that one, the code triggering the bug is
#ifdef'd out in 2.4.18-18:

#if DEBUG
			if (slabp->inuse)
				BUG();
#endif

This is only compiled in if CONFIG_DEBUG_SLAB=y which it isn't for x86


Comment 4 Alan Cox 2003-06-08 01:49:35 UTC
transferring to AS since we want the bug to live with AS for AS lifespans..