Bug 507550

Summary:

[RHEL5.4 KVM]: Instant reboot when kexec'ing on AMD

Product:

Red Hat Enterprise Linux 5

Reporter:

Chris Lalancette <clalance>

Component:

kvm

Assignee:

Chris Lalancette <clalance>

Status:

CLOSED DUPLICATE

QA Contact:

Lawrence Lim <llim>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.4

CC:

gleb, lihuang, qcai, tburke, tools-bugs, virt-maint, ykaul

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-12-07 09:15:40 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

507548, 527955

Attachments:

Description	Flags
log	none

Description Chris Lalancette 2009-06-23 09:43:37 UTC

Description of problem:
I'm running a RHEL-5.4 x86_64 guest under a RHEL-5.4 x86_64 kvm AMD host.  When trying to kexec into a new kernel inside the guest, instead of booting the new kernel the guest actually instantly reboots.  Under Intel, this doesn't happen (although kexec doesn't complete for other, unrelated reasons).  I've partially tracked it down; when trying to do relocate_kernel inside the guest, at some point we try to fill the cr3 with the new temporary page tables for the new kernel:

Dump of assembler code for function relocate_new_kernel:
0xffffffff8006211f <relocate_new_kernel+0>:	pushq  $0x0
0xffffffff80062121 <relocate_new_kernel+2>:	popfq  
0xffffffff80062122 <relocate_new_kernel+3>:	mov    (%rsi),%r8
0xffffffff80062125 <relocate_new_kernel+6>:	mov    0x80(%rsi),%rcx
0xffffffff8006212c <relocate_new_kernel+13>:	mov    0x10(%rsi),%r9
0xffffffff80062130 <relocate_new_kernel+17>:	mov    %r9,%cr3

It's right at the last instructions that the reboot occurs.  Looking at the dmesg on the host from this time, we see:

kvm: inject_page_fault: double fault 0xffff810037c97010
kvm: inject_page_fault: double fault 0xffff810037ca6010

So KVM is upset because it's trying to inject a page fault while an exception is already pending.  I still need to track down which exception is already pending and why.

Comment 1 Chris Lalancette 2009-06-23 15:06:42 UTC

Ug, this is actually worse than I thought.  During a crash, we go through:

kernel/kexec.c:crash_kexec() -> arch/x86_64/kernel/crash.c:machine_crash_shutdown().  In machine_crash_shutdown(), one of the things we do is an nmi_shootdown_cpus(); this is supposed to go to all of the other (non-crashing) cpus in the system, deliver an NMI IPI to them, and basically make them spin in a loop.  The problem is that this IPI doesn't seem to be getting delivered to the other CPUS *at all*, meaning that they are still running around doing other things, and when we go to switch out the page tables, they then fault, double fault, and triple fault trying to access their text pages (I think).  So the next thing to find out is why no NMI IPI's are being delivered to these CPU's, even though they should.

Chris Lalancette

Comment 2 Chris Lalancette 2009-06-24 11:35:20 UTC

It keeps getting worse.  The reason NMI IPI's aren't being delivered is because in RHEL-5, AMD has no NMI delivery support.  None at all.  What this means is that kdump in the guest kernel goes to deliver an NMI IPI, but the underlying KVM implementation more-or-less just completely discards this.  So the other CPUs continue on their merry way, until the page tables get ripped out from under them and they triple fault.

Now, SVM NMI support has recently (April) been added to the upstream kernel.  The problem is that it requires a re-write of the IRQ injection.  So a backport is not really possible.  I'm going to look at essentially a re-implementation of that support in the RHEL-5 sources, to see if I can something that works.  Apparently NMI support is also required to pass some WHQL tests, so it will be a good thing to have working.

Chris Lalancette

Comment 3 Dor Laor 2009-06-28 13:57:53 UTC

Hope it is not too late, this is dangerous change at this stage.
Gleb, any comments, pointers?

Comment 4 Chris Lalancette 2009-06-29 08:02:50 UTC

(In reply to comment #3)
> Hope it is not too late, this is dangerous change at this stage.

Right, but it is more-or-less required functionality.

> Gleb, any comments, pointers?  

I talked to Gleb about this on IRC; basically, the patches that went upstream cannot be backported to RHEL-5.4 since they depend on the interrupt re-working.  I've been looking at doing a different implementation of it, based on Gleb's work, but obviously fairly different.  Once I have a patch, I'll send it to Gleb + company, and we can see if it is too dangerous and risky to take.  Then we can decide on when and where to put it in.

Chris Lalancette

Comment 5 Dor Laor 2009-10-08 12:26:18 UTC

Can QE test latest code in the Z stream? Gleb added NMI support so it should work.

Comment 6 lihuang 2009-10-08 15:18:22 UTC

Hi Chris

   I have an attempt in kvm-83-105.el5_4.9 
   
   Guest kernel panic :
   http://pastebin.test.redhat.com/16120

   could you have a look.What might go wrong ? 

   The oops also occur in Intel' Host (does this mean the original issue is gone ? )

Thanks 
Lijun Huang

Comment 7 lihuang 2009-10-08 15:21:05 UTC

Created attachment 364138 [details]
log

save the log to attachment.

Comment 8 Qian Cai 2009-10-08 15:37:47 UTC

This does not look like the original problem. You would get immediately reset from a AMD guest. I have seen that you are loading the original initramfs instead of reserving a memory and using kdump. Can you try to use kdump to see if you meet the same problem? If so, that is something we probably need to fix. Ideally, you can try it by specifying a dump target in kdump.conf (copying VMCores from the kdump initramfs) and without it (copying VMCores from kdump daemon by running INIT in the second kernel).

Comment 9 Chris Lalancette 2009-10-08 15:40:40 UTC

(In reply to comment #6)
> Hi Chris
> 
>    I have an attempt in kvm-83-105.el5_4.9 
> 
>    Guest kernel panic :
>    http://pastebin.test.redhat.com/16120
> 
>    could you have a look.What might go wrong ? 
> 
>    The oops also occur in Intel' Host (does this mean the original issue is
> gone ? )

Previous to this fix, you wouldn't get nearly this far; as soon as you executed
the kexec command, the machine would reboot.  That means that the issue listed
in this particular BZ is indeed fixed.  Please open up a new BZ about the
secondary crash you are seeing, since that is something new (and should be
tracked separately).

Note that if you do try Cai's suggestion on an SMP guest, you have a 50% chance of hitting another bug that I am working on, BZ 505527.  To be absolutely certain, make sure that you test with a UP guest to avoid that bug.

Chris Lalancette