Bug 116335

Summary: kscand paniced in the page_referenced function
Product: Red Hat Enterprise Linux 3 Reporter: anand suvernkar <suvernkar>
Component: kernelAssignee: Rik van Riel <riel>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: marty, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-02 04:31:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch of Rik's that was committed to U3 none

Description anand suvernkar 2004-02-20 07:08:20 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2)
Gecko/20030208 Netscape/7.02

Description of problem:

kscand thread paniced in 
page_referenced function.

Following the stack trace. VM strcutures might have got corrupted
leading to the panic

<1>Unable to handle kernel NULL pointer dereference at virtual address
00000084
<4> printing eip:
<4>0215d2c4
<1>*pde = 00003001
<1>*pte = 00000000
<4>Oops: 0000
<4> parport_pc lp parport autofs e100 keybdev mousedev hid
input usb-ohci usbcore ext3 jbd aic7xxx sd_mod scs
<4>CPU:    1
<4>EIP:    0060:�<0215d2c4>�    Tainted: P
<4>EFLAGS: 00010212
<4>
<4>EIP is at page_referenced �kernel� 0x334 (2.4.21-4.ELhugemem)
<4>eax: 038a3698   ebx: 26db5000   ecx: 00000000   edx: 00000000
<4>esi: 038a3698   edi: 0300002c   ebp: 03f2df9c   esp: 03f2df68
<4>Rejecting the heartbeat due to 0 local reference count
<4>ds: 0068   es: 0068   ss: 0068
<4>Process kscand (pid: 8, stackpage=03f2d000)
<4>Stack: 3bc3f380 00000000 0000000e 00000000 00000be8 38556b80
00000000 00000002
<4>       03f2dfac 03dc71ec 03dc71ec 038bd604 023c20d8 03f2dfbc
02153491 02134960
<4>       00000000 00000001 023c1000 023c20d8 00000003 03f2dfec
02154a70 023c1000
<4>Call Trace:   �<02153491>� scan_active_list �kernel� 0xa1 (0x03f2dfa0)
�1�more>
<4>�<02134960>� process_timeout �kernel� 0x0 (0x03f2dfa4)
<4>�<02154a70>� kscand �kernel� 0xa0 (0x03f2dfc0)
<4>�<021549d0>� kscand �kernel� 0x0 (0x03f2dfe0)
<4>�<02109799>� kernel_thread_helper �kernel� 0x5 (0x03f2dff0)
<4>
<4>Code: 8b 81 84 00 00 00 42 39 41 70 89 d9 0f 43 55 e4 81 e1 00 f0
<4>
<4>
�1�kdb>
�1�kdb> bt
03f2c000        8        1   1*  R  03f2c580 *kscand
EBP      EIP      Function (args)
03f2df9c 0215d2c4 page_referenced+0x334 (2134960 0 1 23c1000 23c20d8)
03f2dfbc 02153491 scan_active_list+0xa1 (23c1000 3 23c20d8 3f2c000
3f2c000)
03f2dfec 02154a70 kscand+0xa0
         02109799 kernel_thread_helper+0x5

###########################################################3


To be exact the error was in ptep_to_mm macro.
The code at which the panic occured was
0x0215d2c4 page_referenced+0x334:   mov    0x84(%ecx),%eax
0x0215d2ca page_referenced+0x33a:   inc    %edx
0x0215d2cb page_referenced+0x33b:   cmp    %eax,0x70(%ecx)
0x0215d2ce page_referenced+0x33e:   mov    %ebx,%ecx
0x0215d2d0 page_referenced+0x340:   cmovae 0xffffffe4(%ebp),%edx
0x0215d2d4 page_referenced+0x344:   and    $0xfffff000,%ecx
0x0215d2da page_referenced+0x34a:   cmp    $0xffb93fff,%ecx
0x0215d2e0 page_referenced+0x350:   mov    %edx,0xffffffe4(%ebp)
0x0215d2e3 page_referenced+0x353:   jbe    0x0215d32a
page_referenced+0x39a
0x0215d2e5 page_referenced+0x355:   mov    $0xffffe000,%eax
0x0215d2ea page_referenced+0x35a:   mov    $0xffffe000,%ebx
0x0215d2ef page_referenced+0x35f:   and    %esp,%eax
0x0215d2f1 page_referenced+0x361:   mov    0x20(%eax),%edx
0x0215d2f4 page_referenced+0x364:   mov    %edx,%eax
0x0215d2f6 page_referenced+0x366:   shl    $0x4,%eax
0x0215d2f9 page_referenced+0x369:   add    %edx,%eax



static inline struct mm_struct * ptep_to_mm(pte_t * ptep)
{
        struct page * page = kmap_atomic_to_page(ptep);
 37e:   8b 48 08                mov    0x8(%eax),%ecx
/lhome/salil/linux-2.4.21-4.ELhugemem/mm/rmap.c:184
 381:   8b 55 e4                mov    0xffffffe4(%ebp),%edx
 384:   8b 81 84 00 00 00       mov    0x84(%ecx),%eax
 38a:   42                      inc    %edx
 38b:   39 41 70                cmp    %eax,0x70(%ecx)
/lhome/salil/linux-2.4.21-4.ELhugemem/include/asm/highmem.h:96



Thanks
Anand

Version-Release number of selected component (if applicable):
2.4.21-4.ELhugemem

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
1. Random
2.
3.
    

Actual Results:  panic

Expected Results:  normal

Additional info:

Comment 1 Arjan van de Ven 2004-02-20 07:55:30 UTC
 Tainted: P

which modules are you using ?

Comment 2 anand suvernkar 2004-02-20 08:54:32 UTC

parport_pc
lp
parport
autofs
e100
floppy
keybdev
mousedev
hid
input
usb-ohci
usbcore
ext3
jbd
aic7xxx
sd_mod
scsi_mod

Thanks
Anand

Comment 3 Arjan van de Ven 2004-02-20 09:00:27 UTC
which module then tainted your kernel?
(did you unload one??)

Comment 4 Rik van Riel 2004-03-13 14:56:09 UTC
Hugh Dickins found a bug in the 2.6 kernel that could be related and
sent in a patch to fix it. I'm submitting this patch for RHEL3 Update 3.

Comment 5 Zou Pengcheng 2004-05-17 03:28:15 UTC
could you please tell me where i can find this patch? 
thanx.

Comment 6 Rik van Riel 2004-05-17 12:35:20 UTC
The patch has been applied to the RHEL3 code base and should be in
Update 3.

Comment 7 Kevin Krafthefer 2004-08-16 21:39:02 UTC
can someone please elaborate on the nature of the patch? Does it
prevent the vm from getting corrupted? Does it handle other panics
that result in eflags of 00010212?

Comment 9 Robert Perkins 2004-08-20 18:40:38 UTC
RIk, Please answer the last couple questions... need confirmation it
is in U3, etc.

Comment 11 Ernie Petrides 2004-08-20 23:30:26 UTC
Created attachment 102943 [details]
patch of Rik's that was committed to U3

Comment 12 Ernie Petrides 2004-08-20 23:32:19 UTC
The patch in comment #11 was committed to the RHEL3 U3 patch
pool in kernel version 2.4.21-15.1.EL.  I'm reverting this
bug's state to MODIFIED.


Comment 13 Rik van Riel 2004-08-21 00:06:21 UTC
Ernie, you will also need the patch to do_wp_page, otherwise you leave
a (very) small window for data corruption.

Comment 14 Ernie Petrides 2004-08-21 00:16:57 UTC
Okay, but this bugzilla (oops in page_referenced) is resolved.


Comment 15 John Flanagan 2004-09-02 04:31:05 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html