Bug 73526
Summary: | kernel 2.4.18-10smp crashed; problem in smbfs or VFS | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Joel Votaw <joel> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | joel, pascal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:39:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
I just had another one of these crashes. I have rewritten my script to only
handle one filesystem at a time and am still running into the problem. I
haven't had problems with these scripts on single-processor systems, so I think
this may be a race condition somehow related to SMP.
Here is the ksymoops info for this crash:
[root@files2 root]# ksymoops -V -k /proc/ksyms -l /proc/modules -
o /lib/modules/2.4.18-10smp/ -m /boot/System.map-2.4.18-10smp oops2.log
ksymoops 2.4.4 on i686 2.4.18-10smp. Options used
-V (specified)
-k /proc/ksyms (specified)
-l /proc/modules (specified)
-o /lib/modules/2.4.18-10smp/ (specified)
-m /boot/System.map-2.4.18-10smp (specified)
Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/raid5.o) for raid5
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/xor.o) for xor
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
/usr/bin/find: /lib/modules/2.4.18-10smp/build: No such file or directory
Error (pclose_local): find_objects pclose failed 0x100
Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique
module object. Trace may not be reliable.
kernel BUG at inode.c:1066!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c01588bf>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 0000001c ebx: d94fe420 ecx: c02f25e0 edx: 00008f60
esi: f7d1c000 edi: 00000000 ebp: 00000051 esp: f7ed3f54
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 7, stackpage=f7ed3000)
Stack: c024a65f 0000042a ddd49e58 ddd49e40 d94fe420 c0155fa6 d94fe420 0000038e
00000098 ffffffff c02f3c28 0000038e 0000016c 000003cd c01381e8 000001d0
00000465 00000000 0000016c c01563d0 000020a0 c0138b3c 00000006 000001d0
Call Trace: [<c0155fa6>] prune_dcache [kernel] 0xe6
[<c01381e8>] page_launder [kernel] 0x168
[<c01563d0>] shrink_dcache_memory [kernel] 0x20
[<c0138b3c>] do_try_to_free_pages [kernel] 0x1c
[<c0138e31>] kswapd [kernel] 0x101
[<c0105000>] stext [kernel] 0x0
[<c0107286>] kernel_thread [kernel] 0x26
[<c0138d30>] kswapd [kernel] 0x0
Code: 0f 0b 5a 59 85 f6 74 08 8b 46 20 85 c0 0f 45 f8 85 ff 74 0b
>>EIP; c01588bf <iput+2f/290> <=====
Trace; c0155fa6 <prune_dcache+e6/1c0>
Trace; c01381e8 <page_launder+168/300>
Trace; c01563d0 <shrink_dcache_memory+20/30>
Trace; c0138b3c <do_try_to_free_pages+1c/180>
Trace; c0138e31 <kswapd+101/2d0>
Trace; c0105000 <_stext+0/0>
Trace; c0107286 <kernel_thread+26/30>
Trace; c0138d30 <kswapd+0/2d0>
Code; c01588bf <iput+2f/290>
00000000 <_EIP>:
Code; c01588bf <iput+2f/290> <=====
0: 0f 0b ud2a <=====
Code; c01588c1 <iput+31/290>
2: 5a pop %edx
Code; c01588c2 <iput+32/290>
3: 59 pop %ecx
Code; c01588c3 <iput+33/290>
4: 85 f6 test %esi,%esi
Code; c01588c5 <iput+35/290>
6: 74 08 je 10 <_EIP+0x10> c01588cf <iput+3f/290>
Code; c01588c7 <iput+37/290>
8: 8b 46 20 mov 0x20(%esi),%eax
Code; c01588ca <iput+3a/290>
b: 85 c0 test %eax,%eax
Code; c01588cc <iput+3c/290>
d: 0f 45 f8 cmovne %eax,%edi
Code; c01588cf <iput+3f/290>
10: 85 ff test %edi,%edi
Code; c01588d1 <iput+41/290>
12: 74 0b je 1f <_EIP+0x1f> c01588de <iput+4e/290>
1 warning and 8 errors issued. Results may not be reliable.
I have a similar problem the kernel crash one a month. I am using rsync with smbfs over a ADSL link. But this is on a single processor system. Here is what is in the message log : May 15 23:11:21 s01 kernel: kernel BUG at inode.c:1066! May 15 23:11:21 s01 kernel: invalid operand: 0000 May 15 23:11:21 s01 kernel: smbfs loop ide-cd cdrom soundcore autofs eepro100 st usb-uhci usbcore ext3 jbd May 15 23:11:21 s01 kernel: CPU: 0 May 15 23:11:21 s01 kernel: EIP: 0010:[<c014c32f>] Not tainted May 15 23:11:21 s01 kernel: EFLAGS: 00010286 May 15 23:11:21 s01 kernel: May 15 23:11:21 s01 kernel: EIP is at iput [kernel] 0x2f (2.4.18-10) May 15 23:11:21 s01 kernel: eax: 0000001c ebx: c19b89e0 ecx: 00000001 edx: 00005870 May 15 23:11:21 s01 kernel: esi: cfa6a800 edi: 00000000 ebp: 0000004b esp: c13b3f50 May 15 23:11:21 s01 kernel: ds: 0018 es: 0018 ss: 0018 May 15 23:11:21 s01 kernel: Process kswapd (pid: 5, stackpage=c13b3000) May 15 23:11:21 s01 kernel: Stack: c022968c 0000042a c825a878 c825a860 c19b89e0 c014a0ad c19b89e0 c13b2000 May 15 23:11:21 s01 kernel: 00000000 00000000 ffffffff c02c7ae8 00000000 00000000 0000018f c0130133 May 15 23:11:21 s01 kernel: 000001d0 0000018f 00000000 00000000 c014a410 00000062 c013090c 00000006 May 15 23:11:21 s01 kernel: Call Trace: [<c014a0ad>] prune_dcache [kernel] 0x10d May 15 23:11:21 s01 kernel: [<c0130133>] page_launder [kernel] 0x2b3 May 15 23:11:21 s01 kernel: [<c014a410>] shrink_dcache_memory [kernel] 0x20 May 15 23:11:21 s01 kernel: [<c013090c>] do_try_to_free_pages [kernel] 0x1c May 15 23:11:21 s01 kernel: [<c0130c01>] kswapd [kernel] 0x101 May 15 23:11:21 s01 kernel: [<c0105000>] stext [kernel] 0x0 May 15 23:11:21 s01 kernel: [<c0107136>] kernel_thread [kernel] 0x26 May 15 23:11:21 s01 kernel: [<c0130b00>] kswapd [kernel] 0x0 May 15 23:11:21 s01 kernel: May 15 23:11:21 s01 kernel: May 15 23:11:21 s01 kernel: Code: 0f 0b 5a 59 85 f6 74 08 8b 46 20 85 c0 0f 45 f8 85 ff 74 0b May 15 23:11:22 s01 samba(pam_unix)[18311]: session closed for user Carlton My work-around for this bug has been to use a custom-compiled 2.4.19 kernel. This has generally worked very well: I've been rsyncing about 50 machines in parallel over a fast local network; the machines are still mounted using smbfs. However, even on this kernel I recently had problems: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... kswapd crashed after this. (I won't include the full ksymoops here since this was not with the RedHat kernel in question.) There is some mention of problems like this on the Linux Kernel Mailing List, but nothing conclusive that I found: mixed reports of success with 2.4.18 and 2.4.19, and some people suggesting to move to 2.4.20. Anyways, after this happened I upgraded to the latest RedHat kernel at the time (2.4.18-27.7.xsmp). For the past 3 days it has served me well with no crashes, under the same load as my custom kernel. This hasn't been long enough to be conclusive, but with 2.4.18-10 I would've expected a crash by now. I'm going to upgrade to the very recent RedHat 2.4.20-* kernel and see how that works, and report back in a few weeks. However, if you are having a lot of crashes with 2.4.18-10 you might try 2.4.18-27.7.x, since it seems to be better. -Joel Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |
From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Description of problem: I was running a script on a heavily-loaded system which mounts and unmounts a lot of smbfs filesystems. After a while, the kernel crashed and printed out an Oops message -- details below. The system was still responding to pings, but otherwise was dead. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. Load system down with lots of IO and CPU usage: software RAID 5, copying a lot of data using rsync over SSH, etc. 2. Use shell script to mount and unmount about 12 smbfs filesystems. Actual Results: System hung with an oops. Additional info: Below is the ooops and what ksymoops had to say. Let me know if you need additional information. Before the Oops, I saw the following messages: raid5: multiple 0 requests for sector 256464264 smb_retry: no connection process last message repeated 2 times smb_delete_inode: could not close inode 2 VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Unable to handle kernel paging request at virtual address 740a6279 printing eip: c01588d3 *pde = 00000000 ------------ Ooops ----------- # ksymoops -V -k /proc/ksyms -l /proc/modules -o /lib/modules/2.4.18-10smp/ - m /boot/System.map-2.4.18-10smp oops.log ksymoops 2.4.4 on i686 2.4.18-10smp. Options used -V (specified) -k /proc/ksyms (specified) -l /proc/modules (specified) -o /lib/modules/2.4.18-10smp/ (specified) -m /boot/System.map-2.4.18-10smp (specified) Error (expand_objects): cannot stat(/lib/ext3.o) for ext3 ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/jbd.o) for jbd ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/raid5.o) for raid5 ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/xor.o) for xor ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod ksymoops: No such file or directory /usr/bin/find: /lib/modules/2.4.18-10smp/build: No such file or directory Error (pclose_local): find_objects pclose failed 0x100 Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique module object. Trace may not be reliable. Oops: 0000 CPU: 0 EIP: 0010:[<c01588d3>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 eax: 740a6269 ebx: c9111da0 ecx: f4bfcbd0 edx: c9111db0 esi: c26cc400 edi: 740a6269 ebp: 00000041 esp: f7ed3f5c ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 7, stackpage=f7ed3000) Stack: f4bfcbb8 f4bfcba0 c9111da0 c0155fa6 c9111da0 00000375 000000c3 ffffffff c02f3c28 00000375 00000185 00000493 c01381e8 000001d0 00000556 00000000 00000185 c01563d0 0000230c c0138b3c 00000006 000001d0 000001d0 f7ed2000 Call Trace: [<c0155fa6>] prune_dcache [kernel] 0xe6 [<c01381e8>] page_launder [kernel] 0x168 [<c01563d0>] shrink_dcache_memory [kernel] 0x20 [<c0138b3c>] do_try_to_free_pages [kernel] 0x1c [<c0138e31>] kswapd [kernel] 0x101 [<c0105000>] stext [kernel] 0x0 [<c0107286>] kernel_thread [kernel] 0x26 [<c0138d30>] kswapd [kernel] 0x0 Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 bc 49 2f c0 8d 43 2c 50 >>EIP; c01588d3 <iput+43/290> <===== Trace; c0155fa6 <prune_dcache+e6/1c0> Trace; c01381e8 <page_launder+168/300> Trace; c01563d0 <shrink_dcache_memory+20/30> Trace; c0138b3c <do_try_to_free_pages+1c/180> Trace; c0138e31 <kswapd+101/2d0> Trace; c0105000 <_stext+0/0> Trace; c0107286 <kernel_thread+26/30> Trace; c0138d30 <kswapd+0/2d0> Code; c01588d3 <iput+43/290> 00000000 <_EIP>: Code; c01588d3 <iput+43/290> <===== 0: 8b 47 10 mov 0x10(%edi),%eax <===== Code; c01588d6 <iput+46/290> 3: 85 c0 test %eax,%eax Code; c01588d8 <iput+48/290> 5: 74 04 je b <_EIP+0xb> c01588de <iput+4e/290> Code; c01588da <iput+4a/290> 7: 53 push %ebx Code; c01588db <iput+4b/290> 8: ff d0 call *%eax Code; c01588dd <iput+4d/290> a: 58 pop %eax Code; c01588de <iput+4e/290> b: 68 bc 49 2f c0 push $0xc02f49bc Code; c01588e3 <iput+53/290> 10: 8d 43 2c lea 0x2c(%ebx),%eax Code; c01588e6 <iput+56/290> 13: 50 push %eax 1 warning and 8 errors issued. Results may not be reliable.