Bug 177427
Summary: | rhel3u5: kernel panic in fsync(2) while ~idle | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Francois-Xavier 'FiX' KOWALSKI <francois-xavier.kowalski> | ||||||
Component: | kernel | Assignee: | Dave Anderson <anderson> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | dhoward, petrides | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-06-02 21:51:30 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Francois-Xavier 'FiX' KOWALSKI
2006-01-10 16:34:45 UTC
Please try to reproduce this on the latest officially released kernel, which is 2.4.21-37.EL (RHEL3 U6, released this past September). There was a post-U5 memory corruption fix that might have accounted for this. Thanks in advance. Also, if it is reproducible with the latest kernel, please set up netdump and/or diskdump and forward us the vmcore. We were unable to test with more recent kernels, due to a 3rd-party dependency (a kernel module). However, we have moved forward a lot. The problem arises only on machines that have a very low commit-to-disk performance. For example, the machines that exhibited the bug (with the syslog backtrace) was only able to commit 19 syslog entries on the disk per second. The commit-to-disk performance issue being fixed -- a H/W RAID setup problem -- the problem no longer arises at all. Due to the above, it is very likelly that the long delays spent waiting for write-completion were conccurency windows (the box has 8 processors) exposing to the memory corruption that you have pointed. I will update this record when we will have a chance to rest with the rhel3u6 kernel. A similar-looking problem was reproduced with rhel3u6 kernel. The problem occurs when rebooting the machine. Here is the backtrace (it is a mnual copy from a screen-shot taken with a digital camera on the console[1]. JPG as attache to this bug record): EIP is at ext3_get_inode_loc [ext3] 0xda (2.4.21.37.ELsmp/i686) eax: 00000000 ebx: c4de1c00 ecx: 0000000c edx: f791ed3c esi: 00000060 edi: 00000d00 ebp: 00000003 esp: f6843e30 ds: 0060 es: 0060 ss: 0060 Process reboot (pid: 1090, stackpage=f6843000) Stack: c017fbd0 f70cb1f8 00000000 c4de1c00 f78cb100 00000003 f78cb100 f78ed080 f78cb100 f78f5400 f8850d7b f78cb100 f6843e84 00000000 c32e8140 00009a9b c4de1c00 c010152a c4de1c00 00009a9b c32e8140 00000000 00000000 f78cb100 Call Trace: [<c017fbd8>] alloc_inode [kernel] 0xc0 (0xf6843e38) [<f8858d7b>] ext3_read_inode [ext3] 0x1b (0xf6843e60) [<c018152a>] iget4_locked [kernel] 0x10a (0xf6843e7c) [<f885a78b>] ext3_lookup [ext3] 0xbb (0xf6843ea4) [<c017338c>] real_lookup [kernel] 0xec (0xf0xf6843ec8) [<c01739e7>] link_path_walk [kernel] 0x487 (0xf6843ee8) [<c0173f69>] path_lookup [kernel] 0x39 (0xf6843f28) [<c017452e>] open_namei [kernel] 0x7e (0xf6843f38) [<c0163813>] filp_open [kernel] 0x43 (0xf6843f68) [<c0163c53>] sys_open [kernel] 0x63 (0xf6843fa0) [1] How could we get a text console other than this VGA stuff BTW? We have no serial link available on this site... About the diskdump/netdump setup, I have requested that it is setup. I do not knwo at this time whether it will be possible or not. Created attachment 137288 [details]
console screen shot
Created attachment 137289 [details]
console screen shot
|