Bug 177427 - rhel3u5: kernel panic in fsync(2) while ~idle
rhel3u5: kernel panic in fsync(2) while ~idle
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity urgent
: ---
: ---
Assigned To: Dave Anderson
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2006-01-10 11:34 EST by Francois-Xavier 'FiX' KOWALSKI
Modified: 2009-06-02 17:51 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-06-02 17:51:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
console screen shot (57.81 KB, image/jpeg)
2006-09-28 03:53 EDT, Francois-Xavier 'FiX' KOWALSKI
no flags Details
console screen shot (53.73 KB, image/jpeg)
2006-09-28 03:54 EDT, Francois-Xavier 'FiX' KOWALSKI
no flags Details

  None (edit)
Description Francois-Xavier 'FiX' KOWALSKI 2006-01-10 11:34:45 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7

Description of problem:
After a fresh system boot, our -- java based -- application is started to perform some web-based configuration.  The output of this JVM (Sun HotSpot JDK 1.4.2) is redirected to syslog, via a pipe to logger(1).

Here is the crash backtrace:

EIP is at __out_of_line_bug [kernel] 0x17 (2.4.21-32.ELsmp/i686)
eax: 00000026   ebx: f7465f90   ecx: c0383eb4   edx: 01fa7ed7
esi: f7fee000   edi: f7465f90   ebp: 00000009   esp: f7465f34
ds: 0068   es: 0068   ss: 0068
Process syslogd (pid: 766, stackpage=f7465000)
Stack: c02bd964 000000fe c0173c1c 000000fe c0172a90 f7465f90 f7fee000
      c0173a87 f7fee000 f7fee000 f7465f90 c0173df9 00252d88 006afb8a
      00000000 00000000 00000000 c0162c3b 00000000 bfff8e10 fffffeff
Call Trace:   [<c0173c1c>] path_init [kernel] 0x16c (0xf7465f3c)
[<c0172a90>] getname [kernel] 0xa0 (0xf7465f44)
[<c0173a87>] path_lookup [kernel] 0x17 (0xf7465f54)
[<c0173df9>] __user_walk [kernel] 0x49 (0xf7465f64)
[<c0162c3b>] sys_access [kernel] 0x7b (0xf7465f80)
[<c0166377>] sys_fsync [kernel] 0x47 (0xf7465f9c)

Code: 0f 0b 37 01 5f d0 2b c0 90 eb fe 8d b4 26 00 00 00 00 8d bc

Kernel panic: Fatal exception 

Version-Release number of selected component (if applicable):
kernel, `uname-r`=2.4.21-32.ELsmp

How reproducible:

Steps to Reproduce:
No simple scenario: The crash does not seem to be related to any specific user operation.  We are currently working to isolate this issue.


Additional info:
Comment 1 Ernie Petrides 2006-01-10 13:05:08 EST
Please try to reproduce this on the latest officially released kernel,
which is 2.4.21-37.EL (RHEL3 U6, released this past September).  There
was a post-U5 memory corruption fix that might have accounted for this.

Thanks in advance.
Comment 2 Dave Anderson 2006-01-10 14:22:55 EST
Also, if it is reproducible with the latest kernel, please set up 
netdump and/or diskdump and forward us the vmcore.  
Comment 3 Francois-Xavier 'FiX' KOWALSKI 2006-01-24 10:44:58 EST
We were unable to test with more recent kernels, due to a 3rd-party dependency
(a kernel module).  However, we have moved forward a lot.

The problem arises only on machines that have a very low commit-to-disk
performance.  For example, the machines that exhibited the bug (with the syslog
backtrace) was only able to commit 19 syslog entries on the disk per second.

The commit-to-disk performance issue being fixed  -- a H/W RAID setup problem --
the problem no longer arises at all.

Due to the above, it is very likelly that the long delays spent waiting for
write-completion were conccurency windows (the box has 8 processors) exposing to
the memory corruption that you have pointed.

I will update this record when we will have a chance to rest with the rhel3u6
Comment 4 Francois-Xavier 'FiX' KOWALSKI 2006-09-28 03:51:27 EDT
A similar-looking problem was reproduced with rhel3u6 kernel.  The problem
occurs when rebooting the machine.  Here is the backtrace (it is a mnual copy
from a screen-shot taken with a digital camera on the console[1].  JPG as
attache to this bug record):

EIP is at ext3_get_inode_loc [ext3] 0xda (
eax: 00000000 ebx: c4de1c00 ecx: 0000000c edx: f791ed3c
esi: 00000060 edi: 00000d00 ebp: 00000003 esp: f6843e30
ds: 0060 es: 0060 ss: 0060
Process reboot (pid: 1090, stackpage=f6843000)
Stack: c017fbd0 f70cb1f8 00000000 c4de1c00 f78cb100 00000003 f78cb100 f78ed080
       f78cb100 f78f5400 f8850d7b f78cb100 f6843e84 00000000 c32e8140 00009a9b
       c4de1c00 c010152a c4de1c00 00009a9b c32e8140 00000000 00000000 f78cb100
Call Trace: [<c017fbd8>] alloc_inode [kernel] 0xc0 (0xf6843e38)
[<f8858d7b>] ext3_read_inode [ext3] 0x1b (0xf6843e60)
[<c018152a>] iget4_locked [kernel] 0x10a (0xf6843e7c)
[<f885a78b>] ext3_lookup [ext3] 0xbb (0xf6843ea4)
[<c017338c>] real_lookup [kernel] 0xec (0xf0xf6843ec8)
[<c01739e7>] link_path_walk [kernel] 0x487 (0xf6843ee8)
[<c0173f69>] path_lookup [kernel] 0x39 (0xf6843f28)
[<c017452e>] open_namei [kernel] 0x7e (0xf6843f38)
[<c0163813>] filp_open [kernel] 0x43 (0xf6843f68)
[<c0163c53>] sys_open [kernel] 0x63 (0xf6843fa0)

[1] How could we get a text console other than this VGA stuff BTW? We have no
serial link available on this site...

About the diskdump/netdump setup, I have requested that it is setup.  I do not
knwo at this time whether it will be possible or not.
Comment 5 Francois-Xavier 'FiX' KOWALSKI 2006-09-28 03:53:35 EDT
Created attachment 137288 [details]
console screen shot
Comment 6 Francois-Xavier 'FiX' KOWALSKI 2006-09-28 03:54:26 EDT
Created attachment 137289 [details]
console screen shot

Note You need to log in before you can comment on or make changes to this bug.