Bug 539328 - kernel BUG at arch/i386/mm/highmem.c:43! when creating a snapshot
Summary: kernel BUG at arch/i386/mm/highmem.c:43! when creating a snapshot
Keywords:
Status: CLOSED DUPLICATE of bug 541956
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Cong Wang
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-19 21:30 UTC by Bill Braswell
Modified: 2013-09-30 02:11 UTC (History)
10 users (show)

Fixed In Version: t
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-10 17:22:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bill Braswell 2009-11-19 21:30:06 UTC
While creating a snapshot, the customer's system consistently crashes with:

	kernel BUG at arch/i386/mm/highmem.c:43!

     29 static void *__kmap_atomic(struct page *page, enum km_type type, pgprot_t prot)

...
     42         if (!pte_none(*(kmap_pte-idx)))

     43                 BUG();


Though the pte varies depending upon the current application at the time of the crash, the pte always has the value of 0x2.

The customer says, “Running the snapshots with the oracle app up would cause a kernel panic every time. However, running the snapshots with the oracle app down wouldn't cause a panic at all.”

crash> !hostname; pwd

core-i386.gsslab.rdu.redhat.com

/cores/20091013154019

PID: 13289  TASK: f68f7550  CPU: 2   COMMAND: "hpetfe"

 #0 [f291cd38] crash_kexec at c04411df

 #1 [f291cd7c] die at c040649f

 #2 [f291cdac] do_invalid_op at c0406bf4

 #3 [f291ce5c] error_code (via invalid_op) at c0405a87

    EAX: 00000002  EBX: f2848bcc  ECX: cecf7e40  EDX: 00000228  EBP: c0014bb0 

    DS:  007b      ESI: 0000000f  ES:  007b      EDI: f7b7fe40

    CS:  0060      EIP: c041cfb9  ERR: ffffffff  EFLAGS: 00210286 

 #4 [f291ce90] kmap_atomic at c041cfb9

 #5 [f291cea8] __handle_mm_fault at c0461762

 #6 [f291cfa4] sys_waitpid at c0427208

 #7 [f291cfb8] error_code at c0405a87

    EAX: 00000000  EBX: 00000000  ECX: 00000000  EDX: 088de088 

    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000000

    SS:  007b      ESP: bf92e0fc  EBP: bf92e228

    CS:  0073      EIP: 0807b560  ERR: ffffffff  EFLAGS: 00210202

Other dumps are located at /cores/20091112110807 and /cores/20091116183135

Comment 2 Dave Anderson 2009-12-03 13:44:20 UTC
OK thanks, I'll reassign this to Eric Sandeen.

Comment 3 Eric Sandeen 2009-12-03 23:27:21 UTC
Ok, this part was not upstream, but we put it in hoping to prevent memory starvation via mmap when the filesystem was frozen... did not anticipate this situation, thanks for the report, I'll look for a solution.

-Eric

Comment 4 Eric Sandeen 2009-12-04 16:15:41 UTC
Related also to bug #541956

Comment 5 Eric Sandeen 2009-12-04 16:45:31 UTC
I think the patch for 541956 might address this; if we do:

pte_unmap_unlock(page_table, ptl);
        pte_unmap(page_table);
                kunmap_atomic(pte, KM_PTE0);

before we sleep on the frozen fs, will that get us out of the atomic kmap?

Comment 6 Cong Wang 2009-12-06 15:50:41 UTC
(In reply to comment #5)
> I think the patch for 541956 might address this; if we do:
> 
> pte_unmap_unlock(page_table, ptl);
>         pte_unmap(page_table);
>                 kunmap_atomic(pte, KM_PTE0);
> 
> before we sleep on the frozen fs, will that get us out of the atomic kmap?  

I think so, We'd better test if this problem is gone after that patch applied.

Thanks.

Comment 7 Eric Sandeen 2009-12-07 17:08:30 UTC
Amerigo, do you have a test kernel built already that we could provide to the customer?  Bill/Fabio - can we ask the customer to test this?

Thanks,
-Eric

Comment 10 Cong Wang 2009-12-08 02:19:15 UTC
(In reply to comment #7)
> Amerigo, do you have a test kernel built already that we could provide to the
> customer?  Bill/Fabio - can we ask the customer to test this?
> 

Yes, please download this one asap.
https://brewweb.devel.redhat.com/taskinfo?taskID=2111628

Comment 12 Stuart D Gathman 2009-12-10 15:06:35 UTC
Oracle is not needed for the crash.  When the root filesystem is in LVM (and /etc/lvm is stored in the root filesystem LV), taking a snapshot of the root filesystem LV works once.  The second time crashes.  After rebooting from the crash, it will again work once.

Comment 13 Stuart D Gathman 2009-12-10 15:11:04 UTC
I should clarify: when the root filesystem of *dom0* is in LVM, snapshots of the dom0 root filesystem work once and then crash.  Root filesystems of virtual machines seem to work, although the problems with Oracle suggests that they might fail also under heavy updates.

Comment 16 Linda Wang 2009-12-10 17:22:33 UTC

*** This bug has been marked as a duplicate of bug 541956 ***


Note You need to log in before you can comment on or make changes to this bug.