Bug 121496 - Kernel freeze at hard disk work
Summary: Kernel freeze at hard disk work
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-04-22 04:48 UTC by Alex Lyashkov
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-10-15 00:19:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alex Lyashkov 2004-04-22 04:48:05 UTC
Description of problem:
Many client my company have problem with server freeze after start
logs analyzer script or other hard disk activity at EXT3 partions with
quotas.

Version-Release number of selected component (if applicable):
2.4.21-9.0.1 from up2date. (for you not say it not supported version)

How reproducible:
n/a

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Same freeze been found at RH 7.3 with all 2.4.18 kernel (this version
have bug at ext3 journal) but i think it fixed at 2.4.20 kernels and i
 can`t replicate this bug (start at separate sessions 'sync' and 'find
... -print'). Posible 2.4.21-9.0.1.EL have same bug.

Comment 1 Alex Lyashkov 2004-04-22 05:31:00 UTC
Please update you kernel with this patch!
http://www.sfu.ca/~siegert/linux-security/msg00047.html

Comment 2 Dynamic Net, Inc. 2004-04-22 08:01:39 UTC
We are having the same problem.

Simple cron processing to rotate logs, and run stats programs freezes 
the server.

We are on kernel 2.4.21-9.0.1.ELsmp

The same scripts ran without a hitch for over 18 months on RedHat 7.3.


Comment 3 Stephen Tweedie 2004-04-22 21:43:23 UTC
I don't think the patch is correct.  All it conceivably affects is
data=journal mode.  In the default data=ordered mode, or in
data=writeback, it will have no effect; and in data=journal, I can't
see it having any deadlock effect.

What is the backtrace of the stuck processes? Alt-sysrq-t will emit
that (you can capture it via serial console, netconsole or on syslog
if the root fs is still alive.)

Comment 4 Alex Lyashkov 2004-04-26 08:25:54 UTC
If me can get task list I will be attach it to PR, but it can`t
replicate at my SMP test box. We setup remote loging via syslog and
see kernel always locked after start script who get diskquota stats
for selected uid. It can be bug with diskquota and sleep at dqput ?
patch for it sending Jan Kara 22 Apr 2004 to linux-kernel@.

Comment 5 Stephen Tweedie 2004-04-26 10:12:01 UTC
OK, thanks for getting the remote logging set up; please update once
you can capture some diagnostics.

Comment 6 Alex Lyashkov 2004-04-29 11:01:58 UTC
I get one frezze on my smp test box, but it no default kernel. It`s
minimal config for starting kernel who include all of need modules in
static linking with kernel.

SysRq : Crashing the kernel by request
Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
c0224543
*pde = 00000000
Oops: 0002

CPU:    1
EIP:    0060:[<c0224543>]    Not tainted
EFLAGS: 00010082

EIP is at sysrq_handle_crash [kernel] 0x3 (2.4.21pre1/i686)
eax: d8877eb8   ebx: c0457640   ecx: 00000001   edx: c0410d14
esi: 00000063   edi: 00000006   ebp: d8877da4   esp: d8877da4
ds: 0068   es: 0068   ss: 0068
Process basename (pid: 5430, stackpage=d8877000)
Stack: d8877dc8 c0224dfa 00000063 d8877eb8 c0575de0 dd7df000 dd7df000
c0575de0
       d8877eb8 d8877df0 c0224d60 00000063 d8877eb8 c0575de0 dd7df000
00000063
       0000002e 0000270e 00000000 d8877e1c c0222279 00000063 d8877eb8
c0575de0
Call Trace:   [<c0224dfa>] __handle_sysrq_nolock [kernel] 0x7a
(0xd8877da8)
[<c0224d60>] handle_sysrq [kernel] 0x50 (0xd8877dcc)
[<c0222279>] handle_scancode [kernel] 0x2a9 (0xd8877df4)
[<c0223508>] handle_kbd_event [kernel] 0xb8 (0xd8877e20)
[<c0223559>] keyboard_interrupt [kernel] 0x49 (0xd8877e40)
[<c013426c>] update_process_times_statistical [kernel] 0x8c (0xd8877e48)
[<c010e06b>] handle_IRQ_event [kernel] 0x6b (0xd8877e58)
[<c010e3e3>] do_IRQ [kernel] 0x133 (0xd8877e78)
[<c01a3510>] ext3_release_file [kernel] 0x0 (0xd8877eac)
[<c010e2b0>] do_IRQ [kernel] 0x0 (0xd8877eb0)
[<c01a3510>] ext3_release_file [kernel] 0x0 (0xd8877ec8)
[<c0165433>] .text.lock.file_table [kernel] 0x69 (0xd8877ee0)
[<c0145abc>] exit_mmap [kernel] 0x1dc (0xd8877f20)
[<c012428e>] mmput [kernel] 0x9e (0xd8877f4c)
[<c012c220>] do_exit [kernel] 0x160 (0xd8877f60)
[<c012c70f>] do_group_exit [kernel] 0x11f (0xd8877f94)
[<c01454da>] sys_munmap [kernel] 0x4a (0xd8877fa0)

Code: c6 05 00 00 00 00 00 5d c3 8d 74 26 00 55 89 e5 0f b6 55 09


Comment 7 Stephen Tweedie 2004-05-25 13:16:36 UTC
Please let us know if you can reproduce with a Red Hat kernel.


Note You need to log in before you can comment on or make changes to this bug.