Bug 158164

Summary:	kernel 2.6.11-1_14_FC3 panics when paging to md-device
Product:	[Fedora] Fedora	Reporter:	Frank Wittig <mail>
Component:	kernel	Assignee:	Dave Jones <davej>
Status:	CLOSED CANTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3	CC:	pfrields, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-10-03 01:07:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Frank Wittig 2005-05-19 10:35:06 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
After updating to kernel 2.6.11-1_FC3 I got a kernel panic every time the kernel tried to page the first byte to the swap-partition.
Swap lies resides on /dev/md1 which is a level 1 raid device over two parallel ata disks (both master of 1st/2nd channel).
There's no way of getting arround this failure but downgrading to an older version of the kernel. Older versions never crash on my system during paging to swap.

I'm working to reproduce the failure on another machine and will post kernel messages when I have more information.

Version-Release number of selected component (if applicable):
kernel-2.6.11-1_14_FC3

How reproducible:
Always

Steps to Reproduce:
1. setup swap partition on md-device
2. Upgrade to kernel-2.6.11-1_14_FC3
3. use as much memory it needs to make the kernel page to disk

Actual Results:  kernel panic occured

Expected Results:  kernel should have paged to disk

Additional info:

Comment 1 Frank Wittig 2005-05-19 16:09:39 UTC

I've tried to reproduce on another machine. It uses the SMP version of kernel
2.6.11-1_14_FC3 and didn't crash when paging to disk.
I will try to reproduce this with the single cpu kernel tomorrow.

Comment 2 Dave Jones 2005-05-21 07:01:59 UTC

Can you get the details from the panic ? Any backtrace ?
Without it, theres not a lot to go on.

Comment 3 Dave Jones 2005-05-21 07:05:04 UTC

*** Bug 158170 has been marked as a duplicate of this bug. ***

Comment 4 Frank Wittig 2005-05-21 07:44:18 UTC

Can you please tell me where I can possibly find these backtraces?
There's been nothing logged by syslog.
My SMP-System crashed on this kernel version after some hours, too. It also uses
an md-raid level1 for swapping.
If there is a place where this has been logged to, ther must me a bunch of log
info but I don't know where to search yet.

Comment 5 Dave Jones 2005-05-21 07:50:22 UTC

it should at least print them to the console when it panics.
or is it just locking up with no output at all ?

Does the test kernel at http://people.redhat.com/davej/kernels/Fedora/FC3/
work any better ?

Comment 6 Frank Wittig 2005-05-22 05:42:10 UTC

My SMP machine locks up with no output. The single kernel version paniced with
so much output that i coudn't read anything because of the scroll.
I got new HDDs on friday eve. So I'm now able to rebuild the original hardware
(on which the panic first occured) for testing purposes. but this will have to
wait until monday.

Comment 7 Frank Wittig 2005-05-23 12:03:20 UTC

o.k. - this is what I read on the screen:

Oops: 0000 [#2]
Modules linked in: xfs exportfs md5 ipv6 parport_pc lp parport autofs4 sunrpc
dm_mod uhci_hcd hw_random epic100 mii floppy ext3 jdb raid1 aic7xxx sd_mod scsi_mod
CPU: 0
EIP: 0060:[<c0103e3e>] Not tainted VLI
EFLAGS: 00010002 (2.6.11-1.14_FC3)
EIP is at show_trace+0x20/0x78
eax: 00010ffd ebx: 00010002 ecx: 000057dc edx: 000057dc
esi: 00010002 edi: 00010000 ebp: 00000001 esp: c7cb3fec
ds: 007b es: 007b ss:0068
Process (pid: -942923424, threadinfo=c7cb3000 task=c7cc2000)
Stack: c037710a c01001291 c7cb4200 00000018 00000000
Call Trace:
 [<c0101291>] kernel_thread_helper+0x5/0xb
 =======================
Unable to handle kernel paging request at virtual address 0010000f
 printing eip:
c0103e3e
*pde = 00000000
Recursive die() failure, output suppressed
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Comment 8 Frank Wittig 2005-05-23 12:42:06 UTC

And again:

Oops: 0000 [#6]
Modules linked in: xfs exportfs md5 ipv6 parport_pc lp parport autofs4 sunrpc
dm_mod uhci_hcd hw_random epic100 mii floppy ext3 jdb raid1 aic7xxx sd_mod scsi_mod
CPU: 0
EIP: 0060:[<c0104036>] Not tainted VLI
EFLAGS: 00010086 (2.6.11-1.14_FC3)
EIP is at show_registers+0xf9/0x1ba
eax: c0447000 ebx: c04479d4 ecx: e0ffdac7 edx: e0ffd8ff
esi: c04479a0 edi: 00000068 ebp: 00000001 esp: c0447854
ds: 007b es: 007b ss:0068
Unable to handle kernel paging request at virtual address e0ffd98f
 printing eip:
c0104036
*pde = 00000000
Recursive die() failure, output suppressed
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Now I'll try the new kernel version.

Comment 9 Dave Jones 2005-05-23 21:39:17 UTC

That seems to just parsing garbage off (what it thinks is) the stack.

The Oops: 0000 [#2] in comment #7 means this is the 2nd oops. The first oops
likely contained more useful information.

Any chance to hook this up to a serial console ?
Does it still happen if  you dont use XFS ? Thats been known to have issues with
4KB stacks, so we could be overflowing the stack.

Comment 10 Dave Jones 2005-07-15 18:53:16 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 11 Dave Jones 2005-10-03 01:07:13 UTC

This bug has been automatically closed as part of a mass update.
It had been in NEEDINFO state since July 2005.
If this bug still exists in current errata kernels, please reopen this bug.

There are a large number of inactive bugs in the database, and this is the only
way to purge them.

Thank you.