507550 – [RHEL5.4 KVM]: Instant reboot when kexec'ing on AMD

Bug 507550 - [RHEL5.4 KVM]: Instant reboot when kexec'ing on AMD

Summary: [RHEL5.4 KVM]: Instant reboot when kexec'ing on AMD

Keywords:
Status:	CLOSED DUPLICATE of bug 492290
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kvm
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Chris Lalancette
QA Contact:	Lawrence Lim
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	507548 527955
TreeView+	depends on / blocked

Reported:	2009-06-23 09:43 UTC by Chris Lalancette
Modified:	2014-03-26 00:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-12-07 09:15:40 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
log (12.78 KB, text/plain) 2009-10-08 15:21 UTC, lihuang	no flags	Details
View All

Description Chris Lalancette 2009-06-23 09:43:37 UTC

Description of problem:
I'm running a RHEL-5.4 x86_64 guest under a RHEL-5.4 x86_64 kvm AMD host.  When trying to kexec into a new kernel inside the guest, instead of booting the new kernel the guest actually instantly reboots.  Under Intel, this doesn't happen (although kexec doesn't complete for other, unrelated reasons).  I've partially tracked it down; when trying to do relocate_kernel inside the guest, at some point we try to fill the cr3 with the new temporary page tables for the new kernel:

Dump of assembler code for function relocate_new_kernel:
0xffffffff8006211f <relocate_new_kernel+0>:	pushq  $0x0
0xffffffff80062121 <relocate_new_kernel+2>:	popfq  
0xffffffff80062122 <relocate_new_kernel+3>:	mov    (%rsi),%r8
0xffffffff80062125 <relocate_new_kernel+6>:	mov    0x80(%rsi),%rcx
0xffffffff8006212c <relocate_new_kernel+13>:	mov    0x10(%rsi),%r9
0xffffffff80062130 <relocate_new_kernel+17>:	mov    %r9,%cr3

It's right at the last instructions that the reboot occurs.  Looking at the dmesg on the host from this time, we see:

kvm: inject_page_fault: double fault 0xffff810037c97010
kvm: inject_page_fault: double fault 0xffff810037ca6010

So KVM is upset because it's trying to inject a page fault while an exception is already pending.  I still need to track down which exception is already pending and why.

Comment 1 Chris Lalancette 2009-06-23 15:06:42 UTC

Ug, this is actually worse than I thought.  During a crash, we go through:

kernel/kexec.c:crash_kexec() -> arch/x86_64/kernel/crash.c:machine_crash_shutdown().  In machine_crash_shutdown(), one of the things we do is an nmi_shootdown_cpus(); this is supposed to go to all of the other (non-crashing) cpus in the system, deliver an NMI IPI to them, and basically make them spin in a loop.  The problem is that this IPI doesn't seem to be getting delivered to the other CPUS *at all*, meaning that they are still running around doing other things, and when we go to switch out the page tables, they then fault, double fault, and triple fault trying to access their text pages (I think).  So the next thing to find out is why no NMI IPI's are being delivered to these CPU's, even though they should.

Chris Lalancette

Comment 2 Chris Lalancette 2009-06-24 11:35:20 UTC

It keeps getting worse.  The reason NMI IPI's aren't being delivered is because in RHEL-5, AMD has no NMI delivery support.  None at all.  What this means is that kdump in the guest kernel goes to deliver an NMI IPI, but the underlying KVM implementation more-or-less just completely discards this.  So the other CPUs continue on their merry way, until the page tables get ripped out from under them and they triple fault.

Now, SVM NMI support has recently (April) been added to the upstream kernel.  The problem is that it requires a re-write of the IRQ injection.  So a backport is not really possible.  I'm going to look at essentially a re-implementation of that support in the RHEL-5 sources, to see if I can something that works.  Apparently NMI support is also required to pass some WHQL tests, so it will be a good thing to have working.

Chris Lalancette

Comment 3 Dor Laor 2009-06-28 13:57:53 UTC

Hope it is not too late, this is dangerous change at this stage.
Gleb, any comments, pointers?

Comment 4 Chris Lalancette 2009-06-29 08:02:50 UTC

(In reply to comment #3)
> Hope it is not too late, this is dangerous change at this stage.

Right, but it is more-or-less required functionality.

> Gleb, any comments, pointers?  

I talked to Gleb about this on IRC; basically, the patches that went upstream cannot be backported to RHEL-5.4 since they depend on the interrupt re-working.  I've been looking at doing a different implementation of it, based on Gleb's work, but obviously fairly different.  Once I have a patch, I'll send it to Gleb + company, and we can see if it is too dangerous and risky to take.  Then we can decide on when and where to put it in.

Chris Lalancette

Comment 5 Dor Laor 2009-10-08 12:26:18 UTC

Can QE test latest code in the Z stream? Gleb added NMI support so it should work.

Comment 6 lihuang 2009-10-08 15:18:22 UTC

Hi Chris

   I have an attempt in kvm-83-105.el5_4.9 
   
   Guest kernel panic :
   http://pastebin.test.redhat.com/16120

   could you have a look.What might go wrong ? 

   The oops also occur in Intel' Host (does this mean the original issue is gone ? )

Thanks 
Lijun Huang

Comment 7 lihuang 2009-10-08 15:21:05 UTC

Created attachment 364138 [details]
log

save the log to attachment.

Comment 8 Qian Cai 2009-10-08 15:37:47 UTC

This does not look like the original problem. You would get immediately reset from a AMD guest. I have seen that you are loading the original initramfs instead of reserving a memory and using kdump. Can you try to use kdump to see if you meet the same problem? If so, that is something we probably need to fix. Ideally, you can try it by specifying a dump target in kdump.conf (copying VMCores from the kdump initramfs) and without it (copying VMCores from kdump daemon by running INIT in the second kernel).

Comment 9 Chris Lalancette 2009-10-08 15:40:40 UTC

(In reply to comment #6)
> Hi Chris
> 
>    I have an attempt in kvm-83-105.el5_4.9 
> 
>    Guest kernel panic :
>    http://pastebin.test.redhat.com/16120
> 
>    could you have a look.What might go wrong ? 
> 
>    The oops also occur in Intel' Host (does this mean the original issue is
> gone ? )

Previous to this fix, you wouldn't get nearly this far; as soon as you executed
the kexec command, the machine would reboot.  That means that the issue listed
in this particular BZ is indeed fixed.  Please open up a new BZ about the
secondary crash you are seeing, since that is something new (and should be
tracked separately).

Note that if you do try Cai's suggestion on an SMP guest, you have a 50% chance of hitting another bug that I am working on, BZ 505527.  To be absolutely certain, make sure that you test with a UP guest to avoid that bug.

Chris Lalancette

Note You need to log in before you can comment on or make changes to this bug.