From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2) Gecko/20040301 Description of problem: Kernel 2.4.21-4.EL produced similar crash. No known way to reproduce. Will attach sample oops. We have also experienced system hangs, that may or may no be related. ACNLIN80 - CRASH INFO 2/1/03 (Newer Kernel) nfs_statfs: statfs error = 116 Unable to handle kernel paging request at virtual address 00080006 printing eip: f8cde44f *pde = 307a9001 *pte = 3f398067 Oops: 0000 mvfs vnode nfsd nfs lockd sunrpc lp parport autofs4 e1000 floppy sg microcode ke] ybdev mousedev hid input usb-uhci usbcore ext3 jbd raid1 qla2300 qla2300_conf CPU: 1 EFLAGS: 00010246 EIP is at mdki_memcmp [vnode] 0x11 (2.4.21-20.0.1.ELsmp/i686) eax: e994190c ebx: c9f1d834 ecx: 00000034 edx: 00000cb4 esi: e994190c edi: 00080006 ebp: d8087e44 esp: d8087e3c ds: 0068 es: 0068 ss: 0068 Process cp (pid: 12441, stackpage=d8087000) Stack: e9941908 000aadcc d8087e60 f8cf73c2 e994190c 00080006 00000034 00000000 e82c3c00 d8087e84 f8cf746f c9f1d834 e9941908 f648122c e9941908 00000000 e82c3c00 00000000 d8087ebc f8cf6d13 e82c3c00 e9941908 d8087ea8 00000001 Code: f3 a6 0f 97 c0 0f 92 c2 5e 28 d0 0f be c0 5f c9 c3 55 89 e5 ACNLIN80 - CRASH INFO 12/3/04 (Old Kernel) Unable to handle kernel paging request at virtual address 01000017 printing eip: 0217d137 *pde = 00005001 *pte = 7e0000e3 Oops: 0000 ide-cd cdrom sg mvfs vnode nfsd nfs lockd sunrpc lp parport autofs e1000 floppy microcode keybdev mousedev hid input usb-uhci usbcore ext3 jbd raid1 qla2300 q CPU: 2 EIP: 0060:[<0217d137>] Tainted: PF EFLAGS: 00010206 EIP is at iput [kernel] 0x37 (2.4.21-4.ELcustom) eax: 00ffffff ebx: ea0d8b00 ecx: ea0d8b10 edx: 159f1900 esi: 00ffffff edi: e5f08800 ebp: 00000146 esp: b8393f24 ds: 0068 es: 0068 ss: 0068 Process umount (pid: 9707, stackpage=b8393000) Stack: 00000000 0217a010 f8c9bac7 159f1918 159f1900 ea0d8b00 0217a4fa ea0d8b00 ea0d8b00 dd73c180 dd73c180 f8cb21a0 f8cb2390 0217a854 000001e3 c74bec00 02168df4 dd73c180 0239ff68 00000000 b8393f8c 0804def8 feffb858 0217fe3f Call Trace: [<0217a010>] dput [kernel] 0x30 (0xb8393f28) [<f8c9bac7>] nfs_dentry_iput [nfs] 0x57 (0xb8393f2c) [<0217a4fa>] prune_dcache [kernel] 0x17a (0xb8393f3c) [<f8cb21a0>] nfs_sops [nfs] 0x0 (0xb8393f50) [<f8cb2390>] nfs_fs_type [nfs] 0x0 (0xb8393f54) [<0217a854>] shrink_dcache_parent [kernel] 0x24 (0xb8393f58) [<02168df4>] kill_super [kernel] 0x94 (0xb8393f64) [<0217fe3f>] sys_umount [kernel] 0x3f (0xb8393f80) [<021606ae>] filp_close [kernel] 0x8e (0xb8393f94) [<0217feb7>] sys_oldumount [kernel] 0x17 (0xb8393fb4) Code: Bad EIP value. Kernel panic: Fatal exception Version-Release number of selected component (if applicable): kernel-2.4.21-20.0.1.EL How reproducible: Couldn't Reproduce Steps to Reproduce: 1.Will crash about once a month, but can't reproduce at will. 2. 3. Additional info:
This crash occurred in mdki_memcmp(), which is part of a non-RHEL3 module (presumably one that tainted your kernel). Thus, we expect the problem to be within that module, and so you'd need to file a bug with whoever provides/supports it. Please feel free to reopen this bugzilla report if you can reproduce the problem with an untainted kernel.
(In reply to comment #1) > This crash occurred in mdki_memcmp(), which is part of a non-RHEL3 > module (presumably one that tainted your kernel). Thus, we expect > the problem to be within that module, and so you'd need to file a > bug with whoever provides/supports it. > > Please feel free to reopen this bugzilla report if you can reproduce > the problem with an untainted kernel. > That doesn't appear to be the case in the earlier crash. What is the policy ? Does RedHat only investigate problems when no other 3rd party modules are installed ?
Sev, I don't understand why the line between "CPU:" and "EFLAGS:" is missing in the output from your first crash. But basically, if we don't even have access to the source code that crashed, we aren't able to debug the problem. If the 3rd-party-vendor can point us to a bug in the core RHEL3 code, then we'd certainly be happy to fix it.
For completeness I include the output again. For some reason the output going to syslogd from our digi box did not contain all the output, present on the console. I will contact IBM/Rational about their ClearCase module. Puting aside this crash, can you tell me anything at all about the other crash I also included ? Unable to handle kernel paging request at virtual address 00080006 printing eip: f8cde44f *pde = 307a9001 *pte = 3f398067 Oops: 0000 mvfs vnode nfsd nfs lockd sunrpc lp parport autofs4 e1000 floppy sgmicrocode keybdev mousedev hid input usb-uhci usbcore ext3 jbd raid1 qla2300 qla2300_conf CPU: 1 EIP: 0060:[<f8cde44f>] Tainted: PF EFLAGS: 00010246 EIP is at mdki_memcmp [vnode] 0x11 (2.4.21-20.0.1.ELsmp/i686) eax: e994190c ebx: c9f1d834 ecx: 00000034 edx: 00000cb4 esi: e994190c edi: 00080006 ebp: d8087e44 esp: d8087e3c ds: 0068 es: 0068 ss: 0068 Process cp (pid: 12441, stackpage=d8087000) Stack: e9941908 000aadcc d8087e60 f8cf73c2 e994190c 00080006 00000034 00000000 e82c3c00 d8087e84 f8cf746f c9f1d834 e9941908 f648122c e9941908 00000000 e82c3c00 00000000 d8087ebc f8cf6d13 e82c3c00 e9941908 d8087ea8 00000001 Call Trace: [<f8cf73c2>] mvfs_find_cred [mvfs] 0x32 (0xd8087e48) [<f8cf746f>] mvfs_record_cred [mvfs] 0x8f (0xd8087e64) [<f8cf6d13>] mfs_getcleartext [mvfs] 0x573 (0xd8087e88) [<f8ceeacc>] mvfs_openv_ctx [mvfs] 0x2ec (0xd8087ec0) [<f8d30060>] mvfs_vnodeops [mvfs] 0x0 (0xd8087f00) [<f8d16dd6>] mvfs_linux_open_wrapper [mvfs] 0x16 (0xd8087f10) [<f8cd6a3f>] vnode_fop_open [vnode] 0xb3 (0xd8087f2c) [<c0162490>] dentry_open [kernel] 0x110 (0xd8087f54) [<c0162378>] filp_open [kernel] 0x68 (0xd8087f70) [<c0162783>] sys_open [kernel] 0x53 (0xd8087fa8) Code: f3 a6 0f 97 c0 0f 92 c2 5e 28 d0 0f be c0 5f c9 c3 55 89 e5 Kernel panic: Fatal exception
> can you tell me anything at all about the other crash I also included ? It looks like the "s_op" field of the (struct super_block) or the "i_sb" field of the (struct inode) was bad while executing inside iput() under NFS unmount handling. This might be related to inode handling problems within MVFS or an invalid attempt to unmount an NFS f/s being used by MVFS. I haven't seen any reports similar to that, nor have there been any related changes to fs/inode.c or fs/dcache.c since 2.4.21-4.EL that would address such an issue. But that release is well over a year old (there have been 4 updates released since then), so I would advise using a recent kernel.
Thanks for the info. Will upgrade just in case.