From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; Q312461) Description of problem: For at least the last two RH kernel releases I've been having consistant kernel bugs while performing backups (using dump). I thought this was related to using an Exabyte tape device, but today it happened doing a dump from my IDE drive to my software RAID partition. Here are the kernel messages: kernel BUG at slab.c:1767! invalid operand: 0000 Kernel 2.4.9-31 CPU: 0 EIP: 0010:[kmem_cache_reap+216/576] Not tainted EIP: 0010:[<c012b5c8>] Not tainted EFLAGS: 00010082 EIP is at kmem_cache_reap [kernel] 0xd8 eax: 0000001b ebx: c5f70000 ecx: c02af5c4 edx: 000027c3 esi: c14c0e30 edi: c14c0e40 ebp: 0000009a esp: c14cbfa0 ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 5, stackpage=c14cb000) Stack: c022ce5c 000006e7 00000001 00000002 00000002 c14c0ea0 0000024a 000000c0 000000c0 008e000 c012d636 000000c0 c14ca000 00000006 c012d695 000000c0 00000000 00010f00 c14c3fb8 c0105000 c0105726 00000000 c012d640 c02c7fd8 Call Trace: [IRQ0x0f_interrupt+109868/135280] .rodata.str1.1 [kernel] 0x1f57 Call Trace: [<c022ce5c>] .rodata.str1.1 [kernel]0x1f57 [do_try_to_free_pages+70/80] do_try_to_free_pages [kernel] 0x46 [<c012d636>] do_try_to_free_pages [kernel] 0x46 [kswapd+85/240] kswapd [kernel] 0x55 [<c012d695>] kswapd [kernel] 0x55 [_stext+0/48] stext [kernel] 0x0 [<c0105000>] stext [kernel] 0x0 [kernel_thread+38/48] kernel_thread [kernel] 0x26 [<c0105726>] kernel_thread [kernel] 0x26 [kswapd+0/240] kswapd [kernel] 0x0 [<c012d640>] kswapd [kernel] 0x0 Code: 0f 0b 58 5a 8b 03 45 39 f8 75 dd 8b 4e 28 89 ea 8b 5e 48 d3 At this point, the system could still operate, but the kswapd was hung. Here are the modules I have loaded: Module Size Used by Not tainted 3c59x 26120 1 (autoclean) ipchains 38952 0 st 26484 0 raid5 17792 1 xor 6360 0 [raid5] sym53c8xx 57540 4 sd_mod 12028 4 scsi_mod 97048 3 [st sym53c8xx sd_mod] This is an AMD K6 CPU (350MHz) with 128Mb of RAM. This problem appeared to occur more often when I ran with just 32Mb, but it hasn't gone away. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. /sbin/dump 0fu - /usr | gzip > /home/backup/usr.dump.gz [root@backup /root]# df Filesystem 1k-blocks Used Available Use% Mounted on /dev/hda1 256667 188851 54564 78% / /dev/hda6 3612272 1010832 2417900 30% /usr /dev/md0 8782856 3858260 4478440 47% /home/backup [root@backup /root]# cat /proc/mdstat Personalities : [raid5] read_ahead 1024 sectors md0 : active raid5 sdd1[3] sdc1[2] sdb1[1] sda1[0] 8923136 blocks level 5, 32k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> Actual Results: kswapd died (rarely, see trace above) with previous kernels, system would sometimes hang and/or crash (hasn't happened yet) Additional info:
We just hit this on RedHat AS 2.1 with kernel 2.4.9-e.3 Under similar circumstances (doing filesystem stress testing with tar load on an ext3 filesystem over ips (serveRAID)). For RHAS, it's at slab.c:1769, but it's an identical problem (kswapd trying to reap a slab page with occupied entries). The reproduceability is small for us (about once every ten stress tests). It has also only shown up with ext3 so far.
is this fixed in the most recent 7.1 kernel erratum ?
I've got to say obviously, yes for that one, the code triggering the bug is #ifdef'd out in 2.4.18-18: #if DEBUG if (slabp->inuse) BUG(); #endif This is only compiled in if CONFIG_DEBUG_SLAB=y which it isn't for x86
transferring to AS since we want the bug to live with AS for AS lifespans..