While creating a snapshot, the customer's system consistently crashes with:
kernel BUG at arch/i386/mm/highmem.c:43!
29 static void *__kmap_atomic(struct page *page, enum km_type type, pgprot_t prot)
42 if (!pte_none(*(kmap_pte-idx)))
Though the pte varies depending upon the current application at the time of the crash, the pte always has the value of 0x2.
The customer says, “Running the snapshots with the oracle app up would cause a kernel panic every time. However, running the snapshots with the oracle app down wouldn't cause a panic at all.”
crash> !hostname; pwd
PID: 13289 TASK: f68f7550 CPU: 2 COMMAND: "hpetfe"
#0 [f291cd38] crash_kexec at c04411df
#1 [f291cd7c] die at c040649f
#2 [f291cdac] do_invalid_op at c0406bf4
#3 [f291ce5c] error_code (via invalid_op) at c0405a87
EAX: 00000002 EBX: f2848bcc ECX: cecf7e40 EDX: 00000228 EBP: c0014bb0
DS: 007b ESI: 0000000f ES: 007b EDI: f7b7fe40
CS: 0060 EIP: c041cfb9 ERR: ffffffff EFLAGS: 00210286
#4 [f291ce90] kmap_atomic at c041cfb9
#5 [f291cea8] __handle_mm_fault at c0461762
#6 [f291cfa4] sys_waitpid at c0427208
#7 [f291cfb8] error_code at c0405a87
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 088de088
DS: 007b ESI: 00000000 ES: 007b EDI: 00000000
SS: 007b ESP: bf92e0fc EBP: bf92e228
CS: 0073 EIP: 0807b560 ERR: ffffffff EFLAGS: 00210202
Other dumps are located at /cores/20091112110807 and /cores/20091116183135
OK thanks, I'll reassign this to Eric Sandeen.
Ok, this part was not upstream, but we put it in hoping to prevent memory starvation via mmap when the filesystem was frozen... did not anticipate this situation, thanks for the report, I'll look for a solution.
Related also to bug #541956
I think the patch for 541956 might address this; if we do:
before we sleep on the frozen fs, will that get us out of the atomic kmap?
(In reply to comment #5)
> I think the patch for 541956 might address this; if we do:
> pte_unmap_unlock(page_table, ptl);
> kunmap_atomic(pte, KM_PTE0);
> before we sleep on the frozen fs, will that get us out of the atomic kmap?
I think so, We'd better test if this problem is gone after that patch applied.
Amerigo, do you have a test kernel built already that we could provide to the customer? Bill/Fabio - can we ask the customer to test this?
(In reply to comment #7)
> Amerigo, do you have a test kernel built already that we could provide to the
> customer? Bill/Fabio - can we ask the customer to test this?
Yes, please download this one asap.
Oracle is not needed for the crash. When the root filesystem is in LVM (and /etc/lvm is stored in the root filesystem LV), taking a snapshot of the root filesystem LV works once. The second time crashes. After rebooting from the crash, it will again work once.
I should clarify: when the root filesystem of *dom0* is in LVM, snapshots of the dom0 root filesystem work once and then crash. Root filesystems of virtual machines seem to work, although the problems with Oracle suggests that they might fail also under heavy updates.
*** This bug has been marked as a duplicate of bug 541956 ***