Description of problem: On a VIA EPIA CL10K mini-itx board, the kernel has been hanging with regularity. It appears to be during reads from the harddrive. Version-Release number of selected component (if applicable): How reproducible: It's random, but has consistently left me with uptimes less than 3 days.
Created attachment 108911 [details] kernel oops I noticed this oops messages as well.
I was using Fedora Core 1 on this motherboard and it was working fine with the last FC1 kernel. I skipped over FC2 due to the difficulty of installing on the EPIA boards, and went to FC3. The kernel oops may or may not be related to my problem, but I have seen it at least twice on the EPIA board, and have not seen it on the other x86 hardware I have. I'm using EXT3 partitions and the last FC3 kernel.
are you running the cpuspeed service ? if so, do the hangs go away when you disable it ? (adding Stephen to Cc for insight into the ext3 related oops)
I can't make any sense of the oops. We oopsed with Unable to handle kernel paging request at virtual address a8d8f0e0 in ext3_block_to_path:0x35ed: 000035db <ext3_block_to_path>: 35db: 55 push %ebp 35dc: 57 push %edi 35dd: 56 push %esi 35de: 53 push %ebx 35df: 83 ec 0c sub $0xc,%esp 35e2: 89 d3 mov %edx,%ebx 35e4: 89 0c 24 mov %ecx,(%esp,1) 35e7: 8b 90 d4 00 00 00 mov 0xd4(%eax),%edx 35ed: 8b 82 e0 01 00 00 mov 0x1e0(%edx),%eax Now, %edx actually holds 0x1fcc0400, so I can't see where the oops address 0xa8d8f0e0 can come from. Here we're dereferencing inode->i_sb->s_fs_info. So we actually got an oops looking up a field within inode->i_sb, which the filesystem never ever touches. So if there's a software problem here, it looks more like a core VFS one than an ext3 one. But that's still hard to square with the oops trace showing an apparently bogus %edx. It looks more like a hardware or hardware-related problem at first glance based on the symptoms so far. Do you have any other oopses captured that might shed more light on this?
Hey guys, thanks for responding. I went away on Christmas and left it on for about 9 days without problems. Another lockup happened yesterday after I got back. I left it up on the console 1 but forgot to shut off the screen blanker. Sooooo.... I'll do that again and see what happens. I did have the cpuspeed service running. I turned it off tonight. I'll see if this makes a difference over time. Min uptime: 10 minutes. Max uptime: 9 days. One more oops at apparently the same point: Dec 16 04:07:58 crow kernel: Unable to handle kernel paging request at virtual address a74aece0 Dec 16 04:07:58 crow kernel: printing eip: Dec 16 04:07:58 crow kernel: 208615ed Dec 16 04:07:58 crow kernel: *pde = 00000000 Dec 16 04:07:58 crow kernel: Oops: 0000 [#1] Dec 16 04:07:58 crow kernel: Modules linked in: nfsd exportfs lockd sch_ingress cls_u32 sch_sfq sch_cbq parport_pc lp parport autofs4 sunrpc ipt_multiport ipt_state iptable_filter ipt_MASQUERADE iptable_nat ip_conntrack ip_tables dm_mod button battery ac md5 ipv6 joydev uhci_hcd ehci_hcd tuner msp3400 bttv video_buf i2c_algo_bit v4l2_common btcx_risc i2c_core videodev snd_bt87x snd_via82xx snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore via_rhine mii ext3 jbd Dec 16 04:07:58 crow kernel: CPU: 0 Dec 16 04:07:58 crow kernel: EIP: 0060:[<208615ed>] Not tainted VLI Dec 16 04:07:58 crow kernel: EFLAGS: 00010212 (2.6.9-1.681_FC3) Dec 16 04:07:58 crow kernel: EIP is at ext3_block_to_path+0x12/0x12e [ext3] Dec 16 04:07:58 crow kernel: eax: 0d130418 ebx: 00000000 ecx: 1c564d44 edx: 1e3e0000 Dec 16 04:07:58 crow kernel: esi: 00000000 edi: 0d130318 ebp: 0d130418 esp: 1c564cdc Dec 16 04:07:58 crow kernel: ds: 007b es: 007b ss: 0068 Dec 16 04:07:58 crow kernel: Process updatedb (pid: 7766, threadinfo=1c564000 task=1fa691f0) Dec 16 04:07:58 crow kernel: Stack: 1c564d44 00000000 00000000 00000000 00000000 0d130318 0d130418 20861c68 Dec 16 04:07:58 crow kernel: 1c564d10 00000246 00000000 09ed686c fffffffb 00000000 00000000 00000000 Dec 16 04:07:58 crow kernel: 1fa691f0 0211d26f 1c564d44 1c564d44 00000026 00000000 00000246 1c564d94 Dec 16 04:07:58 crow kernel: Call Trace: Dec 16 04:07:58 crow kernel: [<20861c68>] ext3_get_block_handle+0x37/0x276 [ext3] Dec 16 04:07:58 crow kernel: [<0211d26f>] autoremove_wake_function+0x0/0x2d Dec 16 04:07:58 crow kernel: [<021c2a2a>] avc_node_populate+0x23/0x25 Dec 16 04:07:58 crow kernel: [<20862038>] ext3_getblk+0x7b/0x1fa [ext3] Dec 16 04:07:58 crow kernel: [<021c3f3a>] avc_has_perm_noaudit+0x8d/0xda Dec 16 04:07:58 crow kernel: [<208655a8>] ext3_find_entry+0x150/0x363 [ext3] Dec 16 04:07:58 crow kernel: [<2086596e>] ext3_lookup+0x1f/0x87 [ext3] Dec 16 04:07:58 crow kernel: [<02175487>] real_lookup+0x73/0xde Dec 16 04:07:58 crow kernel: [<02175802>] do_lookup+0x56/0x8f Dec 16 04:07:58 crow kernel: [<021764e5>] link_path_walk+0xcaa/0x1009 Dec 16 04:07:58 crow kernel: [<0215ef01>] copy_str_fromuser_size+0x3d/0x56 Dec 16 04:07:58 crow kernel: [<02176abf>] path_lookup+0xff/0x12f Dec 16 04:07:59 crow kernel: [<02176c03>] __user_walk+0x21/0x51 Dec 16 04:07:59 crow kernel: [<02170c37>] vfs_lstat+0x11/0x37 Dec 16 04:07:59 crow kernel: [<2084cab6>] journal_stop+0x4de/0x4e8 [jbd] Dec 16 04:07:59 crow kernel: [<0215222e>] follow_page_pte+0xec/0xfd Dec 16 04:07:59 crow kernel: [<0215e907>] rw_vm+0x3ef/0x47a Dec 16 04:07:59 crow kernel: [<02171195>] sys_lstat64+0xf/0x23 Dec 16 04:07:59 crow kernel: [<0215eec0>] put_user_size+0x29/0x2d Dec 16 04:07:59 crow kernel: [<0217ad76>] sys_getdents64+0x9f/0xa9 Dec 16 04:07:59 crow kernel: Code: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Dec 16 04:07:59 crow kernel: in_atomic():0[expected: 0], irqs_disabled():1 Dec 16 04:07:59 crow kernel: [<0211cbcb>] __might_sleep+0x7d/0x8a Dec 16 04:07:59 crow kernel: [<0215e726>] rw_vm+0x20e/0x47a Dec 16 04:07:59 crow kernel: [<208615c2>] ext3_delete_inode+0x9c/0xaa [ext3] Dec 16 04:07:59 crow kernel: [<208615c2>] ext3_delete_inode+0x9c/0xaa [ext3] Dec 16 04:07:59 crow kernel: [<0215ee70>] get_user_size+0x30/0x57 Dec 16 04:07:59 crow kernel: [<208615c2>] ext3_delete_inode+0x9c/0xaa [ext3] Dec 16 04:07:59 crow kernel: [<0210682b>] show_registers+0x109/0x15e Dec 16 04:07:59 crow kernel: [<02106a2f>] die+0x14a/0x241 Dec 16 04:07:59 crow kernel: [<0211937e>] do_page_fault+0x0/0x511 Dec 16 04:07:59 crow kernel: [<0211937e>] do_page_fault+0x0/0x511 Dec 16 04:07:59 crow kernel: [<02119733>] do_page_fault+0x3b5/0x511 Dec 16 04:07:59 crow kernel: [<208615ed>] ext3_block_to_path+0x12/0x12e [ext3] Dec 16 04:07:59 crow kernel: [<0211d26f>] autoremove_wake_function+0x0/0x2d Dec 16 04:07:59 crow kernel: [<02250749>] __cfq_get_queue+0xf8/0x206 Dec 16 04:07:59 crow kernel: [<0211d26f>] autoremove_wake_function+0x0/0x2d Dec 16 04:07:59 crow kernel: [<021de48d>] __delay+0x9/0xa Dec 16 04:07:59 crow kernel: [<0211937e>] do_page_fault+0x0/0x511 Dec 16 04:07:59 crow kernel: [<208615ed>] ext3_block_to_path+0x12/0x12e [ext3] Dec 16 04:07:59 crow kernel: [<20861c68>] ext3_get_block_handle+0x37/0x276 [ext3] Dec 16 04:07:59 crow kernel: [<0211d26f>] autoremove_wake_function+0x0/0x2d Dec 16 04:07:59 crow kernel: [<021c2a2a>] avc_node_populate+0x23/0x25 Dec 16 04:07:59 crow kernel: [<20862038>] ext3_getblk+0x7b/0x1fa [ext3] Dec 16 04:07:59 crow kernel: [<021c3f3a>] avc_has_perm_noaudit+0x8d/0xda Dec 16 04:07:59 crow kernel: [<208655a8>] ext3_find_entry+0x150/0x363 [ext3] Dec 16 04:07:59 crow kernel: [<2086596e>] ext3_lookup+0x1f/0x87 [ext3] Dec 16 04:07:59 crow kernel: [<02175487>] real_lookup+0x73/0xde Dec 16 04:07:59 crow kernel: [<02175802>] do_lookup+0x56/0x8f Dec 16 04:07:59 crow kernel: [<021764e5>] link_path_walk+0xcaa/0x1009 Dec 16 04:07:59 crow kernel: [<0215ef01>] copy_str_fromuser_size+0x3d/0x56 Dec 16 04:07:59 crow kernel: [<02176abf>] path_lookup+0xff/0x12f Dec 16 04:07:59 crow kernel: [<02176c03>] __user_walk+0x21/0x51 Dec 16 04:07:59 crow kernel: [<02170c37>] vfs_lstat+0x11/0x37 Dec 16 04:07:59 crow kernel: [<2084cab6>] journal_stop+0x4de/0x4e8 [jbd] Dec 16 04:07:59 crow kernel: [<0215222e>] follow_page_pte+0xec/0xfd Dec 16 04:07:59 crow kernel: [<0215e907>] rw_vm+0x3ef/0x47a Dec 16 04:07:59 crow kernel: [<02171195>] sys_lstat64+0xf/0x23 Dec 16 04:07:59 crow kernel: [<0215eec0>] put_user_size+0x29/0x2d Dec 16 04:07:59 crow kernel: [<0217ad76>] sys_getdents64+0x9f/0xa9 Dec 16 04:07:59 crow kernel: Bad EIP value.
I have observed the same behaviour on a few EPIA CL10K as well. We have a dozen of these runing FC2 with no problem but I haven't been able to get FC3 to run stable on the same hardware yet. There's no X installed, just a minimal installation. They freeze often right after the installation while the soft RAID1 disks are syncing with no other activity going on. I haven't seen any of the oops Rob mentioned.
*** Bug 145185 has been marked as a duplicate of this bug. ***
Disabling cpuspeed has fixed the problem for me. 5 machines have been running without any problem for a week. I used to get a couple of hangs per day.
I had the same problem with a Via Ezra C3 system. It was easily reproducible by going from an idle state to a high-cpu usage state. Running "cat /dev/random > /dev/null" would usually cause it to fail in minutes. Sometimes, but not always, this would appear in the messages log: "phyrewall kernel: Warning: CPU frequency is 798000, cpufreq assumed 731000 kHz" (the exact values would vary). As above, disabling cpuspeed resolved the problem. The system has been up for over 30 hours under various load conditions, where before it always hung within 2 hours.
Okay. It's been well over two weeks, and things are looking good. "cpuspeed" appears to have been the problem. My system has been much more stable now. Uptimes have been good. No further oops have been observed. Thanks guys!
this driver got disabled in an errata kernel. I'll reenable it once I've debugged it, though no promises on how soon that'll be.