143424 – Kernel hangs during reading on VIA EPIA CL10K

Bug 143424 - Kernel hangs during reading on VIA EPIA CL10K

Summary: Kernel hangs during reading on VIA EPIA CL10K

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	145185 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-12-20 17:50 UTC by Rob Ludwick
Modified:	2015-01-04 22:14 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-02-13 04:26:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kernel oops (3.90 KB, text/plain) 2004-12-20 17:59 UTC, Rob Ludwick	no flags	Details
View All

Description Rob Ludwick 2004-12-20 17:50:46 UTC

Description of problem:

On a VIA EPIA CL10K mini-itx board, the kernel has been hanging with
regularity.  It appears to be during reads from the harddrive.

Version-Release number of selected component (if applicable):


How reproducible:
It's random, but has consistently left me with uptimes less than 3 days.

Comment 1 Rob Ludwick 2004-12-20 17:59:05 UTC

Created attachment 108911 [details]
kernel oops

I noticed this oops messages as well.

Comment 2 Rob Ludwick 2004-12-20 18:15:39 UTC

I was using Fedora Core 1 on this motherboard and it was working fine
with the last FC1 kernel.    I skipped over FC2 due to the difficulty
of installing on the EPIA boards, and went to FC3.

The kernel oops may or may not be related to my problem, but I have
seen it at least twice on the EPIA board, and have not seen it on the
other x86 hardware I have.

I'm using EXT3 partitions and the last FC3 kernel.

Comment 3 Dave Jones 2004-12-22 05:41:20 UTC

are you running the cpuspeed service ? if so, do the hangs go away when you
disable it ?

(adding Stephen to Cc for insight into the ext3 related oops)

Comment 4 Stephen Tweedie 2004-12-22 14:46:18 UTC

I can't make any sense of the oops.  We oopsed with

Unable to handle kernel paging request at virtual address a8d8f0e0

in ext3_block_to_path:0x35ed:

000035db <ext3_block_to_path>:
    35db:       55                      push   %ebp
    35dc:       57                      push   %edi
    35dd:       56                      push   %esi
    35de:       53                      push   %ebx
    35df:       83 ec 0c                sub    $0xc,%esp
    35e2:       89 d3                   mov    %edx,%ebx
    35e4:       89 0c 24                mov    %ecx,(%esp,1)
    35e7:       8b 90 d4 00 00 00       mov    0xd4(%eax),%edx
    35ed:       8b 82 e0 01 00 00       mov    0x1e0(%edx),%eax

Now, %edx actually holds 0x1fcc0400, so I can't see where the oops address
0xa8d8f0e0 can come from.

Here we're dereferencing inode->i_sb->s_fs_info.  So we actually got an oops
looking up a field within inode->i_sb, which the filesystem never ever touches.
 So if there's a software problem here, it looks more like a core VFS one than
an ext3 one.  But that's still hard to square with the oops trace showing an
apparently bogus %edx.

It looks more like a hardware or hardware-related problem at first glance based
on the symptoms so far.

Do you have any other oopses captured that might shed more light on this?

Comment 5 Rob Ludwick 2004-12-29 02:31:47 UTC

Hey guys, thanks for responding.  I went away on Christmas and left it on for
about 9 days without problems.  Another lockup happened yesterday after I got
back.  I left it up on the console 1 but forgot to shut off the screen blanker.
 Sooooo.... I'll do that again and see what happens.  

I did have the cpuspeed service running.  I turned it off tonight.  I'll see if
this makes a difference over time.  

Min uptime: 10 minutes.
Max uptime: 9 days.

One more oops at apparently the same point:

Dec 16 04:07:58 crow kernel: Unable to handle kernel paging request at virtual
address a74aece0
Dec 16 04:07:58 crow kernel:  printing eip:
Dec 16 04:07:58 crow kernel: 208615ed
Dec 16 04:07:58 crow kernel: *pde = 00000000
Dec 16 04:07:58 crow kernel: Oops: 0000 [#1]
Dec 16 04:07:58 crow kernel: Modules linked in: nfsd exportfs lockd sch_ingress
cls_u32 sch_sfq sch_cbq parport_pc lp parport autofs4 sunrpc ipt_multiport
ipt_state iptable_filter ipt_MASQUERADE iptable_nat ip_conntrack ip_tables
dm_mod button battery ac md5 ipv6 joydev uhci_hcd ehci_hcd tuner msp3400 bttv
video_buf i2c_algo_bit v4l2_common btcx_risc i2c_core videodev snd_bt87x
snd_via82xx snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore
via_rhine mii ext3 jbd
Dec 16 04:07:58 crow kernel: CPU:    0
Dec 16 04:07:58 crow kernel: EIP:    0060:[<208615ed>]    Not tainted VLI
Dec 16 04:07:58 crow kernel: EFLAGS: 00010212   (2.6.9-1.681_FC3)
Dec 16 04:07:58 crow kernel: EIP is at ext3_block_to_path+0x12/0x12e [ext3]
Dec 16 04:07:58 crow kernel: eax: 0d130418   ebx: 00000000   ecx: 1c564d44  
edx: 1e3e0000
Dec 16 04:07:58 crow kernel: esi: 00000000   edi: 0d130318   ebp: 0d130418  
esp: 1c564cdc
Dec 16 04:07:58 crow kernel: ds: 007b   es: 007b   ss: 0068
Dec 16 04:07:58 crow kernel: Process updatedb (pid: 7766, threadinfo=1c564000
task=1fa691f0)
Dec 16 04:07:58 crow kernel: Stack: 1c564d44 00000000 00000000 00000000 00000000
0d130318 0d130418 20861c68
Dec 16 04:07:58 crow kernel:        1c564d10 00000246 00000000 09ed686c fffffffb
00000000 00000000 00000000
Dec 16 04:07:58 crow kernel:        1fa691f0 0211d26f 1c564d44 1c564d44 00000026
00000000 00000246 1c564d94
Dec 16 04:07:58 crow kernel: Call Trace:
Dec 16 04:07:58 crow kernel:  [<20861c68>] ext3_get_block_handle+0x37/0x276 [ext3]
Dec 16 04:07:58 crow kernel:  [<0211d26f>] autoremove_wake_function+0x0/0x2d
Dec 16 04:07:58 crow kernel:  [<021c2a2a>] avc_node_populate+0x23/0x25
Dec 16 04:07:58 crow kernel:  [<20862038>] ext3_getblk+0x7b/0x1fa [ext3]
Dec 16 04:07:58 crow kernel:  [<021c3f3a>] avc_has_perm_noaudit+0x8d/0xda
Dec 16 04:07:58 crow kernel:  [<208655a8>] ext3_find_entry+0x150/0x363 [ext3]
Dec 16 04:07:58 crow kernel:  [<2086596e>] ext3_lookup+0x1f/0x87 [ext3]
Dec 16 04:07:58 crow kernel:  [<02175487>] real_lookup+0x73/0xde
Dec 16 04:07:58 crow kernel:  [<02175802>] do_lookup+0x56/0x8f
Dec 16 04:07:58 crow kernel:  [<021764e5>] link_path_walk+0xcaa/0x1009
Dec 16 04:07:58 crow kernel:  [<0215ef01>] copy_str_fromuser_size+0x3d/0x56
Dec 16 04:07:58 crow kernel:  [<02176abf>] path_lookup+0xff/0x12f
Dec 16 04:07:59 crow kernel:  [<02176c03>] __user_walk+0x21/0x51
Dec 16 04:07:59 crow kernel:  [<02170c37>] vfs_lstat+0x11/0x37
Dec 16 04:07:59 crow kernel:  [<2084cab6>] journal_stop+0x4de/0x4e8 [jbd]
Dec 16 04:07:59 crow kernel:  [<0215222e>] follow_page_pte+0xec/0xfd
Dec 16 04:07:59 crow kernel:  [<0215e907>] rw_vm+0x3ef/0x47a
Dec 16 04:07:59 crow kernel:  [<02171195>] sys_lstat64+0xf/0x23
Dec 16 04:07:59 crow kernel:  [<0215eec0>] put_user_size+0x29/0x2d
Dec 16 04:07:59 crow kernel:  [<0217ad76>] sys_getdents64+0x9f/0xa9
Dec 16 04:07:59 crow kernel: Code: <3>Debug: sleeping function called from
invalid context at include/linux/rwsem.h:43
Dec 16 04:07:59 crow kernel: in_atomic():0[expected: 0], irqs_disabled():1
Dec 16 04:07:59 crow kernel:  [<0211cbcb>] __might_sleep+0x7d/0x8a
Dec 16 04:07:59 crow kernel:  [<0215e726>] rw_vm+0x20e/0x47a
Dec 16 04:07:59 crow kernel:  [<208615c2>] ext3_delete_inode+0x9c/0xaa [ext3]
Dec 16 04:07:59 crow kernel:  [<208615c2>] ext3_delete_inode+0x9c/0xaa [ext3]
Dec 16 04:07:59 crow kernel:  [<0215ee70>] get_user_size+0x30/0x57
Dec 16 04:07:59 crow kernel:  [<208615c2>] ext3_delete_inode+0x9c/0xaa [ext3]
Dec 16 04:07:59 crow kernel:  [<0210682b>] show_registers+0x109/0x15e
Dec 16 04:07:59 crow kernel:  [<02106a2f>] die+0x14a/0x241
Dec 16 04:07:59 crow kernel:  [<0211937e>] do_page_fault+0x0/0x511
Dec 16 04:07:59 crow kernel:  [<0211937e>] do_page_fault+0x0/0x511
Dec 16 04:07:59 crow kernel:  [<02119733>] do_page_fault+0x3b5/0x511
Dec 16 04:07:59 crow kernel:  [<208615ed>] ext3_block_to_path+0x12/0x12e [ext3]
Dec 16 04:07:59 crow kernel:  [<0211d26f>] autoremove_wake_function+0x0/0x2d
Dec 16 04:07:59 crow kernel:  [<02250749>] __cfq_get_queue+0xf8/0x206
Dec 16 04:07:59 crow kernel:  [<0211d26f>] autoremove_wake_function+0x0/0x2d
Dec 16 04:07:59 crow kernel:  [<021de48d>] __delay+0x9/0xa
Dec 16 04:07:59 crow kernel:  [<0211937e>] do_page_fault+0x0/0x511
Dec 16 04:07:59 crow kernel:  [<208615ed>] ext3_block_to_path+0x12/0x12e [ext3]
Dec 16 04:07:59 crow kernel:  [<20861c68>] ext3_get_block_handle+0x37/0x276 [ext3]
Dec 16 04:07:59 crow kernel:  [<0211d26f>] autoremove_wake_function+0x0/0x2d
Dec 16 04:07:59 crow kernel:  [<021c2a2a>] avc_node_populate+0x23/0x25
Dec 16 04:07:59 crow kernel:  [<20862038>] ext3_getblk+0x7b/0x1fa [ext3]
Dec 16 04:07:59 crow kernel:  [<021c3f3a>] avc_has_perm_noaudit+0x8d/0xda
Dec 16 04:07:59 crow kernel:  [<208655a8>] ext3_find_entry+0x150/0x363 [ext3]
Dec 16 04:07:59 crow kernel:  [<2086596e>] ext3_lookup+0x1f/0x87 [ext3]
Dec 16 04:07:59 crow kernel:  [<02175487>] real_lookup+0x73/0xde
Dec 16 04:07:59 crow kernel:  [<02175802>] do_lookup+0x56/0x8f
Dec 16 04:07:59 crow kernel:  [<021764e5>] link_path_walk+0xcaa/0x1009
Dec 16 04:07:59 crow kernel:  [<0215ef01>] copy_str_fromuser_size+0x3d/0x56
Dec 16 04:07:59 crow kernel:  [<02176abf>] path_lookup+0xff/0x12f
Dec 16 04:07:59 crow kernel:  [<02176c03>] __user_walk+0x21/0x51
Dec 16 04:07:59 crow kernel:  [<02170c37>] vfs_lstat+0x11/0x37
Dec 16 04:07:59 crow kernel:  [<2084cab6>] journal_stop+0x4de/0x4e8 [jbd]
Dec 16 04:07:59 crow kernel:  [<0215222e>] follow_page_pte+0xec/0xfd
Dec 16 04:07:59 crow kernel:  [<0215e907>] rw_vm+0x3ef/0x47a
Dec 16 04:07:59 crow kernel:  [<02171195>] sys_lstat64+0xf/0x23
Dec 16 04:07:59 crow kernel:  [<0215eec0>] put_user_size+0x29/0x2d
Dec 16 04:07:59 crow kernel:  [<0217ad76>] sys_getdents64+0x9f/0xa9
Dec 16 04:07:59 crow kernel:  Bad EIP value.

Comment 6 Charles Lopes 2005-01-11 16:32:41 UTC

I have observed the same behaviour on a few EPIA CL10K as well. We
have a dozen of these runing FC2 with no problem but I haven't been
able to get FC3 to run stable on the same hardware yet. There's no X
installed, just a minimal installation.
They freeze often right after the installation while the soft RAID1
disks are syncing with no other activity going on. I haven't seen any
of the oops Rob mentioned.

Comment 7 Dave Jones 2005-01-15 04:08:29 UTC

*** Bug 145185 has been marked as a duplicate of this bug. ***

Comment 8 Charles Lopes 2005-01-17 11:28:47 UTC

Disabling cpuspeed has fixed the problem for me. 5 machines have been
running without any problem for a week. I used to get a couple of
hangs per day.

Comment 9 Andrew Taylor 2005-01-25 21:17:12 UTC

I had the same problem with a Via Ezra C3 system.  It was easily
reproducible by going from an idle state to a high-cpu usage state. 
Running "cat /dev/random > /dev/null" would usually cause it to fail
in minutes.  Sometimes, but not always, this would appear in the
messages log: "phyrewall kernel: Warning: CPU frequency is 798000,
cpufreq assumed 731000 kHz" (the exact values would vary).

As above, disabling cpuspeed resolved the problem.  The system has
been up for over 30 hours under various load conditions, where before
it always hung within 2 hours.

Comment 10 Rob Ludwick 2005-01-29 05:54:44 UTC

Okay.  It's been well over two weeks, and things are looking good. 
"cpuspeed" appears to have been the problem.  My system has been much
more stable now.  Uptimes have been good.

No further oops have been observed. 

Thanks guys!

Comment 11 Dave Jones 2005-02-13 04:26:50 UTC

this driver got disabled in an errata kernel.
I'll reenable it once I've debugged it, though no promises on how
soon that'll be.

Note You need to log in before you can comment on or make changes to this bug.