|Summary:||panic in shrink|
|Product:||[Fedora] Fedora||Reporter:||JW <ohtmvyyn>|
|Component:||kernel||Assignee:||Kernel Maintainer List <kernel-maint>|
|Status:||CLOSED INSUFFICIENT_DATA||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-03-07 02:40:34 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description JW 2008-03-05 11:32:49 UTC
Description of problem: Kernel panics during modest disk activity Version-Release number of selected component (if applicable): kernel-188.8.131.52-49 How reproducible: Always Steps to Reproduce: 1. create ext3 journaled filesystem 2. create lots of files 3. wait for kernel to crash Actual results: kernel crashes in at least two possible ways Expected results: kernel must not crash! kernel must be stress tested before being released!! Additional info: Instance 1: d_kill+0xe/0x32 prune_one_dentry+0x30/0xb9 prune_dcache+0xe6/0x125 shrink_dcache_memory+0x1c/0x34 shrink_slab+0xd5/0x134 kswapd+0x2a7/0x40a autoremove_wake_function+0x0/0x35 kswapd+0x38/0x409 kthread+0x38/0x5e kthread+0x0/0x5e kernel_thread_helper+0x7/0x10 EIP: list_del+0x26/0x5d Instance 2: journal_try_to_free_buffer+0x5c/0x137 [jbd] free_buffer_head+0x18/0x2c ext3_realease_page+0x0/0x7b [ext3] try_to_release_page+0x30/0x42 __invalidate_mapping_pages+0x79/0xec invalidate_mapping_pages+0xf/0x11 shrink_icache_memory shrink_slab kswapd autoremove_wake_function kswapd kswapd EIP: journal_grab_journal_head+0xf/0x3e
Comment 1 Dave Jones 2008-03-05 17:42:18 UTC
hmm, so you can repeat this on demand ? If so, can you try the 2.6.24 based kernel from updates-testing ? Adding Eric to Cc as maybe he's seen something similar in ext3 development.
Comment 2 Eric Sandeen 2008-03-05 18:00:18 UTC
no, I don't *think* I've seen this. It looks vaguely like bug #428329. But then, it's not a lot to go on. Could you include the full oopses rather than the heavily-edited versions? Also, testing on a debug kernel (yum install kernel-debug) might yield clues.
Comment 3 JW 2008-03-05 22:14:36 UTC
I have gone back to kernel-184.108.40.206-21.fc7 because I have had this running on other hardware for several months now. Pity that it has disappeared from nearly every mirror though. Why do good rpms in updates repository get deleted and replaced by inferior ones (more patches != better patches)?
Comment 4 Eric Sandeen 2008-03-05 22:24:51 UTC
Hm, I'll be sure to suggest to the kernel maintainers that they discontinue their irrational, single-minded quest for more and more patches.... But anyway, you can usually find older rpms on koji.fedoraproject.org, for example http://koji.fedoraproject.org/packages/kernel/220.127.116.11/21.fc7/ If you'd like to see the problem resolved so that you don't have to stick with fc7 kernels, posting the full oops output, or reproducing the problem on the debug kernel as I suggested would be a great help. Otherwise I'm not sure we can hope for a resolution, with the limited information provided. Thanks, -Eric
Comment 5 Chuck Ebbert 2008-03-05 23:46:29 UTC
Please post the complete oops messages.
Comment 6 JW 2008-03-07 02:27:03 UTC
I have gone back to FC7 kernel (kernel-18.104.22.168-21) which is nice and stable. No problems whatsoever over last couple of days (FC8 kernel crashed at least once every day). Cannot vouch for current FC7 kernel update (kernel-22.214.171.124-80) because the numbering has gone from 126.96.36.199 to 188.8.131.52 which doesn't make a lot of sense. Somebody should make 184.108.40.206-21 available again because it is good.
Comment 7 Eric Sandeen 2008-03-07 02:40:34 UTC
Without more information, we cannot proceed on this bug. If you can provide the requested data (full oops output, preferably from a debug kernel), please re-open.
Comment 8 JW 2008-03-07 03:09:06 UTC
If you are comfortable ignoring the stack traces that I have provided then go ahead and close this bug and pretend that FC8 kernel is fantastic. That is your choice, not mine. It certainly is a clever way to eliminate kernel flaws.
Comment 9 Eric Sandeen 2008-03-07 03:37:04 UTC
It is clearly a bug, (well, or perhaps bad memory or whatnot, so far we really cannot tell) and I'd very much like to fix it if possible. I've not seen it reported elsewhere, and you seem to be somewhat uniquely able to hit it. However, the information you have provided is not enough to go on. "something went wrong down this path" just isn't enough to go on. Maybe it was a null pointer. Maybe it was a bad pointer. All I can do is guess. There is a reason that the kernel provides copious amounts of information on an oops; it is so an attempt can be made at debugging. You have provided perhaps 20% of the oops info, and seem to be unwilling or unable to run this one more time to gather the information requested. I have no testcase; I have no oops output. I have a stacktrace, and that is all. I can't see registers, I don't know what modules were loaded, I don't know if your kernel was tainted, etc etc. I don't even know what the actual first line of the oops was. If you don't want me to close this bug, I need just a bit more info from you, as the reporter. I cannot magically infer the missing information. I have a hunch that it may be related to another bug I've seen, which in turn may be related to suspend problems. But, there is simply not enough here to be able to tell. If you can provide the requested info by running the problematic kernel once more, preferably the debug variant, I am more than willing to spend time digging into this bug.