From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1) Gecko/20020827 Description of problem: NFS file server has been crashing with kernel panics and oops' during normal usage by users. Because it seems to occur when users are doing moderate read/write operations, I am using two shell scripts to intensely test disk read/writes and try to recreate the problem. From dmesg: Jan 31 15:15:42 mom kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000028 Jan 31 15:15:42 mom kernel: printing eip: Jan 31 15:15:42 mom kernel: f880e8b2 Jan 31 15:15:42 mom kernel: *pde = 00000000 Jan 31 15:15:42 mom kernel: Oops: 0000 Jan 31 15:15:42 mom kernel: nfs lockd sunrpc autofs 3c59x ide-cd cdrom loop lvm-mod ext3 jbd Jan 31 15:15:42 mom kernel: CPU: 0 Jan 31 15:15:42 mom kernel: EIP: 0010:[<f880e8b2>] Not tainted Jan 31 15:15:42 mom kernel: EFLAGS: 00010207 Jan 31 15:15:42 mom kernel: Jan 31 15:15:42 mom kernel: EIP is at journal_try_to_free_buffers_R6069dd2f [jbd] 0x52 (2.4.18-19.7.x) Jan 31 15:15:42 mom kernel: eax: 00000001 ebx: 00000000 ecx: 000001d0 edx: 00000000 Jan 31 15:15:42 mom kernel: esi: e699df40 edi: 00000001 ebp: 00000000 esp: c36bff2c Jan 31 15:15:42 mom kernel: ds: 0018 es: 0018 ss: 0018 Jan 31 15:15:42 mom kernel: Process kswapd (pid:5,stackpage=c36bf000) Jan 31 15:15:42 mom kernel: Stack: 00000000 c199de10 000001d0 e699df40 c199e2c4 f881e672 f6feae00 c199de10 Jan 31 15:15:42 mom kernel: 000001d0 c013b49f c199de10 000001d0 c199de10 000001d0 c01302f9 c199de10 Jan 31 15:15:42 mom kernel: 000001d0 00001d80 00001d80 00000c38 0002f8ba c02d4864 00000cb3 00001d80 Jan 31 15:15:42 mom kernel: Call Trace: [<f881e672>] ext3_releasepage [ext3] 0x22 (0xc36bff40)) Jan 31 15:15:42 mom kernel: [<c013b49f>] try_to_release_page [kernel] 0x2f (0xc36bff50)) Jan 31 15:15:42 mom kernel: [<c01302f9>] page_launder_zone [kernel]0x519 (0xc36 bff64)) Jan 31 15:15:42 mom kernel: [<c01306f8>] page_launder [kernel] 0x168 (0xc36bff90)) Jan 31 15:15:42 mom kernel: [<c0130fa2>] do_try_to_free_pages [kernel] 0x12 (0xc36bffb0)) Jan 31 15:15:42 mom kernel: [<c01312c1>] kswapd [kernel] 0x121 (0xc36bffd4)) Jan 31 15:15:42 mom kernel: [<c0105000>] stext [kernel] 0x0 (0xc36bffe8)) Jan 31 15:15:42 mom kernel: [<c0107146>] kernel_thread [kernel] 0x26 (0xc36bfff0)) Jan 31 15:15:42 mom kernel: [<c01311a0>] kswapd [kernel] 0x0 (0xc36bfff8)) Jan 31 15:15:42 mom kernel: Jan 31 15:15:42 mom kernel: Jan 31 15:15:42 mom kernel: Code: 8b 5b 28 f6 42 19 02 74 10 89 e0 50 52 e8 fc fe ff ff 5a 85 Version-Release number of selected component (if applicable): 2.4.18-19.7.x How reproducible: Always Steps to Reproduce: 1. Execute this shell script on a single disk partition: dd if=/dev/zero of=testfile bs=16384 count=131072 while true do time cat testfile >/dev/null done 2. Execute this script on a different partition on the same disk: while true do dd if=/dev/zero of=largefile bs=16384 count=131072 done Actual Results: System crashed and produced kernel oops. Expected Results: Normal system operation, with intense disk activity. Additional info: System consists of 2 120 GB Maxtor IDE disks attached to a promise Ultra133 TX2 ide controller card, 1 120GB & 1 20 GB Maxtor disk attached to motherboard, with an Athlon 2200 CPU and 1GB DDR. ksymoops output: [root@mom log]# ksymoops /tmp/mom_oops.txt ksymoops 2.4.4 on i686 2.4.18-19.7.x. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.18-19.7.x/ (default) -m /boot/System.map-2.4.18-19.7.x (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (expand_objects): cannot stat(/lib/ext3.o) for ext3 ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/jbd.o) for jbd ksymoops: No such file or directory Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique module object. Trace may not be reliable. Jan 31 15:15:42 mom kernel: Unable to handle kernel NULL pointer dereference at Jan 31 15:15:42 mom kernel: f880e8b2 Jan 31 15:15:42 mom kernel: *pde = 00000000 Jan 31 15:15:42 mom kernel: Oops: 0000 Jan 31 15:15:42 mom kernel: CPU: 0 Jan 31 15:15:42 mom kernel: EIP: 0010:[<f880e8b2>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 Jan 31 15:15:42 mom kernel: EFLAGS: 00010207 Jan 31 15:15:42 mom kernel: eax: 00000001 ebx: 00000000 ecx: 000001d0 edx: 00000000 Jan 31 15:15:42 mom kernel: esi: e699df40 edi: 00000001 ebp: 00000000 esp: c36bff2c Jan 31 15:15:42 mom kernel: ds: 0018 es: 0018 ss: 0018 Jan 31 15:15:42 mom kernel: Process kswapd (pid: 5, stackpage=c36bf000) Jan 31 15:15:42 mom kernel: Stack: 00000000 c199de10 000001d0 e699df40 c199e2c4 f881e672 f6feae00 c199de10 Jan 31 15:15:42 mom kernel: 000001d0 c013b49f c199de10 000001d0 c199de10 000001d0 c01302f9 c199de10 Jan 31 15:15:42 mom kernel: 000001d0 00001d80 00001d80 00000c38 0002f8ba c02d4864 00000cb3 00001d80 Jan 31 15:15:42 mom kernel: Call Trace: [<f881e672>] ext3_releasepage [ext3] 0x2 Jan 31 15:15:42 mom kernel: [<c013b49f>] try_to_release_page [kernel] 0x2f (0xc3 6bff50)) Jan 31 15:15:42 mom kernel: [<c01302f9>] page_launder_zone [kernel] 0x519 (0xc36 bff64)) Jan 31 15:15:42 mom kernel: [<c01306f8>] page_launder [kernel] 0x168 (0xc36bff90 Jan 31 15:15:42 mom kernel: [<c0130fa2>] do_try_to_free_pages [kernel] 0x12 (0xc 36bffb0)) Jan 31 15:15:42 mom kernel: [<c01312c1>] kswapd [kernel] 0x121 (0xc36bffd4)) Jan 31 15:15:42 mom kernel: [<c0105000>] stext [kernel] 0x0 (0xc36bffe8)) Jan 31 15:15:42 mom kernel: [<c0107146>] kernel_thread [kernel] 0x26 (0xc36bfff0 Jan 31 15:15:42 mom kernel: [<c01311a0>] kswapd [kernel] 0x0 (0xc36bfff8)) Jan 31 15:15:42 mom kernel: Code: 8b 5b 28 f6 42 19 02 74 10 89 e0 50 52 e8 fc f Error (Oops_code_values): invalid value 0xf in Code line, must be 2, 4, 8 or 16 digits, value ignored >>EIP; f880e8b2 <[jbd]journal_try_to_free_buffers+52/90> <===== Trace; f881e672 <[ext3].text.start+4612/ab8f> Trace; c013b49f <try_to_release_page+2f/50> Code; f880e8b2 <[jbd]journal_try_to_free_buffers+52/90> 00000000 <_EIP>: Code; f880e8b2 <[jbd]journal_try_to_free_buffers+52/90> <===== 0: 8b 5b 28 mov 0x28(%ebx),%ebx <===== Code; f880e8b5 <[jbd]journal_try_to_free_buffers+55/90> 3: f6 42 19 02 testb $0x2,0x19(%edx) Code; f880e8b9 <[jbd]journal_try_to_free_buffers+59/90> 7: 74 10 je 19 <_EIP+0x19> f880e8cb <[jbd]journal_try_to_free_buffers+6b/90> Code; f880e8bb <[jbd]journal_try_to_free_buffers+5b/90> 9: 89 e0 mov %esp,%eax Code; f880e8bd <[jbd]journal_try_to_free_buffers+5d/90> b: 50 push %eax Code; f880e8be <[jbd]journal_try_to_free_buffers+5e/90> c: 52 push %edx Code; f880e8bf <[jbd]journal_try_to_free_buffers+5f/90> d: e8 fc 00 00 00 call 10e <_EIP+0x10e> f880e9c0 <[jbd]journal_unmap_buffer+70/1d0> 2 warnings and 3 errors issued. Results may not be reliable. [root@mom log]#
I installed the latest errata kernel (2.4.18-24.7.x) on this system and ran the same tests for 24 hours without experiencing any of the kernel panics & oops mentioned in this bug report. The server was returned to production and appears to be functioning normally.
Looks like I spoke too soon. This oops occurred during an rsync of 30GB from a remote server. [root@mom tmp]# ksymoops 0207_oops.txt ksymoops 2.4.4 on i686 2.4.18-24.7.x. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.18-24.7.x/ (default) -m /boot/System.map-2.4.18-24.7.x (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (expand_objects): cannot stat(/lib/ext3.o) for ext3 ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/jbd.o) for jbd ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/lvm-mod.o) for lvm-mod ksymoops: No such file or directory /usr/bin/find: /lib/modules/2.4.18-24.7.x/build: No such file or directory Error (pclose_local): find_objects pclose failed 0x100 Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique module object. Trace may not be reliable. Feb 7 11:10:41 mom kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000028 Feb 7 11:10:41 mom kernel: f881f8b2 Feb 7 11:10:41 mom kernel: *pde = 00000000 Feb 7 11:10:41 mom kernel: Oops: 0000 Feb 7 11:10:41 mom kernel: CPU: 0 Feb 7 11:10:41 mom kernel: EIP: 0010:[<f881f8b2>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 Feb 7 11:10:41 mom kernel: EFLAGS: 00010207 Feb 7 11:10:41 mom kernel: eax: 00000001 ebx: 00000000 ecx: 000001d0 edx: 00000000 Feb 7 11:10:41 mom kernel: esi: f597e740 edi: 00000001 ebp: 00000000 esp: c34b3f2c Feb 7 11:10:42 mom kernel: ds: 0018 es: 0018 ss: 0018 Feb 7 11:10:42 mom kernel: Process kswapd (pid: 5, stackpage=c34b3000) Feb 7 11:10:42 mom kernel: Stack: 00000000 c12f4810 000001d0 f597e740 c19b202c f882f672 f6d91c00 c12f4810 Feb 7 11:10:42 mom kernel: 000001d0 c013b4df c12f4810 000001d0 c12f4810 000001d0 c0130329 c12f4810 Feb 7 11:10:42 mom kernel: 000001d0 000016a2 000014a9 000005ad 00012b1a c02d4a24 00001342 000016a2 Feb 7 11:10:42 mom kernel: Call Trace: [<f882f672>] ext3_releasepage [ext3] 0x22 (0xc34b3f40)) Feb 7 11:10:42 mom kernel: [<c013b4df>] try_to_release_page [kernel] 0x2f (0xc34b3f50)) Feb 7 11:10:42 mom kernel: [<c0130329>] page_launder_zone [kernel] 0x519 (0xc34b3f64)) Feb 7 11:10:42 mom kernel: [<c0130728>] page_launder [kernel] 0x168 (0xc34b3f90)) Feb 7 11:10:42 mom kernel: [<c0130fd2>] do_try_to_free_pages [kernel] 0x12 (0xc34b3fb0)) Feb 7 11:10:42 mom kernel: [<c01312f1>] kswapd [kernel] 0x121 (0xc34b3fd4)) Feb 7 11:10:42 mom kernel: [<c0105000>] stext [kernel] 0x0 (0xc34b3fe8)) Feb 7 11:10:42 mom kernel: [<c0107166>] kernel_thread [kernel] 0x26 (0xc34b3ff0)) Feb 7 11:10:42 mom kernel: [<c01311d0>] kswapd [kernel] 0x0 (0xc34b3ff8)) Feb 7 11:10:42 mom kernel: Code: 8b 5b 28 f6 42 19 02 74 10 89 e0 50 52 e8 fc fe ff ff 5a 85 >>EIP; f881f8b2 <[jbd]journal_try_to_free_buffers+52/90> <===== Trace; f882f672 <[ext3].text.start+4612/ab8f> Trace; c013b4df <try_to_release_page+2f/50> Trace; c0130329 <page_launder_zone+519/7b0> Trace; c0130728 <page_launder+168/2f0> Trace; c0130fd2 <do_try_to_free_pages+12/180> Trace; c01312f1 <kswapd+121/330> Trace; c0105000 <_stext+0/0> Trace; c0107166 <kernel_thread+26/30> Trace; c01311d0 <kswapd+0/330> Code; f881f8b2 <[jbd]journal_try_to_free_buffers+52/90> 00000000 <_EIP>: Code; f881f8b2 <[jbd]journal_try_to_free_buffers+52/90> <===== 0: 8b 5b 28 mov 0x28(%ebx),%ebx <===== Code; f881f8b5 <[jbd]journal_try_to_free_buffers+55/90> 3: f6 42 19 02 testb $0x2,0x19(%edx) Code; f881f8b9 <[jbd]journal_try_to_free_buffers+59/90> 7: 74 10 je 19 <_EIP+0x19> f881f8cb <[jbd]journal_try_to_free_buffers+6b/90> Code; f881f8bb <[jbd]journal_try_to_free_buffers+5b/90> 9: 89 e0 mov %esp,%eax Code; f881f8bd <[jbd]journal_try_to_free_buffers+5d/90> b: 50 push %eax Code; f881f8be <[jbd]journal_try_to_free_buffers+5e/90> c: 52 push %edx Code; f881f8bf <[jbd]journal_try_to_free_buffers+5f/90> d: e8 fc fe ff ff call ffffff0e <_EIP+0xffffff0e> f881f7c0 <[jbd]__journal_try_to_free_buffer+0/a0> Code; f881f8c4 <[jbd]journal_try_to_free_buffers+64/90> 12: 5a pop %edx Code; f881f8c5 <[jbd]journal_try_to_free_buffers+65/90> 13: 85 00 test %eax,(%eax) 2 warnings and 4 errors issued. Results may not be reliable. [root@mom tmp]#
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/