Bug 224227
Summary: | [RHEL5] Fully virt install of RHEL-4 can reboot dom0 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Chris Lalancette <clalance> | ||||
Component: | kernel-xen | Assignee: | Rik van Riel <riel> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.0 | CC: | dzickus, herbert.xu, xen-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 5.0.0 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-02-13 17:03:02 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Chris Lalancette
2007-01-24 18:12:39 UTC
Do we know if this is a recent regression? It's important that we find out if things were reliable on other recent kernels. Thanks! Stephen, Well, I was having good luck with 2961 earlier, but I tried it again with 2961 and got a different stack trace: (XEN) ----[ Xen-3.0.3-rc5-1.2961.el5 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e010:[<ffff830000158528>] sh_page_fault__shadow_4_guest_4+0x5f8/0x1080 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: ffff8140c00aa788 rbx: ffff8300001a2080 rcx: 000000003416e000 (XEN) rdx: ffff8140c0000000 rsi: ffff8140a0503000 rdi: ffff8300001a2080 (XEN) rbp: ffff8300001af080 rsp: ffff8300001bfcb8 r8: 0000000000000002 (XEN) r9: 0000000000000000 r10: 000000003416e000 r11: 00000000000007e7 (XEN) r12: 0000000000000006 r13: 000000000001c85e r14: ffff8300001bff28 (XEN) r15: ffff8140a0600550 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000003416f000 cr2: ffff8140c00aa788 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e010 (XEN) Xen stack trace from rsp=ffff8300001bfcb8: (XEN) 0000002a9e2c0000 ffff81800002c6d8 00000000001a2080 ffff8140c00aa788 (XEN) 0000000000001c95 000000000003416d 000000000003416e 0000000000005800 (XEN) 00000001001a2080 ffff8300001af080 0000000080000000 0000000080000000 (XEN) 0000000080000000 0000000080000000 0000009070441b8b ffffffff804bca20 (XEN) 000000008f0f0048 ffff83000013bd10 ffff8300001a36b0 ffff83000014e7d1 (XEN) 0000000000000000 ffff83000013bd8f 0000000000000002 ffff83000012f69a (XEN) ffff8300001a36b0 ffff83000014e7d1 0000000000000002 0000000000000001 (XEN) 0000000000000002 ffff830000140348 0000000000000001 ffff830000116ba1 (XEN) ffff8300001b4080 ffff830000116ba1 0000000000000001 ffff830010d60000 (XEN) 0000000000000001 ffff83000013fbbc 0000009094189ab7 0000000000000292 (XEN) 0000002a9e2c0000 ffff830007c5e000 ffff8300384e0550 ffff8300384b9788 (XEN) ffff83001aa88600 8000000003901067 0000000000007c5e 00000000000384e0 (XEN) 00000000000384b9 000000000001aa88 ffff8300001a2080 ffff83000014b1f5 (XEN) 0000000000000000 ffff83000011e949 000000003416d667 0000000001c9c380 (XEN) 0000000000000000 ffff8300001bff28 ffff83000019c640 ffff8300001a2080 (XEN) ffff8300001bff28 00000000000003e4 0000000000000c1c ffff83000014be4b (XEN) 0000002a9e2c0000 ffff830000150959 ffff830000142baa ffff8300001af080 (XEN) ffff8300001a2080 ffff83000014a4f8 ffff8300001bff28 ffff8300001af080 (XEN) 00000100010c8520 0000000000000c1c 0000010017ce3d68 0000000000008000 (XEN) 00000000000003e4 ffff830000151548 0000000000000c1c 00000000000003e4 (XEN) Xen call trace: (XEN) [<ffff830000158528>] sh_page_fault__shadow_4_guest_4+0x5f8/0x1080 (XEN) [<ffff83000013bd10>] pit_get_count+0x30/0xa0 (XEN) [<ffff83000014e7d1>] vmx_load_cpu_guest_regs+0x11/0x300 (XEN) [<ffff83000013bd8f>] pit_latch_count+0xf/0x20 (XEN) [<ffff83000012f69a>] smp_send_event_check_mask+0x3a/0x40 (XEN) [<ffff83000014e7d1>] vmx_load_cpu_guest_regs+0x11/0x300 (XEN) [<ffff830000140348>] send_pio_req+0x1c8/0x240 (XEN) [<ffff830000116ba1>] add_entry+0xe1/0x110 (XEN) [<ffff830000116ba1>] add_entry+0xe1/0x110 (XEN) [<ffff83000013fbbc>] hvm_io_assist+0x89c/0x960 (XEN) [<ffff83000014b1f5>] arch_vmx_do_resume+0x55/0x70 (XEN) [<ffff83000011e949>] context_switch+0x639/0x650 (XEN) [<ffff83000014be4b>] vmx_do_page_fault+0x2b/0x50 (XEN) [<ffff830000150959>] vmx_vmexit_handler+0x339/0xf00 (XEN) [<ffff830000142baa>] cpu_has_pending_irq+0x2a/0x50 (XEN) [<ffff83000014a4f8>] vmx_intr_assist+0xf8/0x400 (XEN) [<ffff830000151548>] vmx_asm_vmexit_handler+0x28/0x30 (XEN) (XEN) Pagetable walk from ffff8140c00aa788: Chris Lalancette I've done a couple of installs of x86_64 RHEL-4-AS U4 with 500mb myself now, one manual, one kickstart, and it has worked fine. Will try on file-backed next, it's been on lvm so far. What sort of hardware are you running on, btw? Created attachment 146536 [details]
[XEN] Stricter TLB-flush discipline when unshadowing pagetables
This is a backport of upstream changeset 11852 which may cause issues like
this. Please check if this makes the problem go away. Thanks!
Good, I can finally reproduce this problem. I haven't been able to see it using a blkback virtual disk, but using a blktap file-backed disk I was able to reproduce it first time. And that does make sense given the suggested patch --- blkback is in-kernel, so doesn't task-switch and is less likely to be disturbed by missing tlb flushes. Will try again with the patch. Except, of course, only PV uses blktap for file-backed domains, FV does not. So that theory goes out the window. FV file-backed domains will still put more pressure on the VM, though, which might well make a difference in this case (O_DIRECT will still use cache for metadata and for filling in of holes in the backing file.) Three successful installs in a row --- initial testing with this patch looks good. Will continue to repeat. I've now had 10 successful FV installs in a row on 3 different machines with the patch, including 5 successful installs on a machine that would reproduce the behavior without the patch 50% of the time. I'm building a kernel now for further testing by partners. Chris Lalancette This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. QE ack for RHEL5. in 2.6.18-7.el5 Closing out. |