Bug 511151

Summary: hung virtual machine, spinning qemu-kvm process
Product: Red Hat Enterprise Linux 5 Reporter: Aron Griffis <aron.griffis>
Component: kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED DUPLICATE QA Contact: Lawrence Lim <llim>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: adaora.onyia, alex_williamson, dwa, jjarvis, knoel, linda.knippers, llim, martine.silbermann, mra, mtosatti, oramraz, rick.hester, rpacheco, shengliang.lv, stillwell, tburke, tools-bugs, virt-maint, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: hp:dl785solblk
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-05 13:23:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kvm-tile15-idle1.qemu-kvm-spinning.strace
none
kvm-tile1-webserver1-console.log
none
kvm-tile1-aim1.log
none
kvm-tile1-aim1.log
none
kvm-tile1-aim1.log
none
kvm-tile16-postfix1-console.log
none
kvm-tile17-specweb1-console.log
none
kvm-tile20-mysql1-console.log
none
kvm-tile32-idle1-console.log
none
kvm-tile17-webclient1-console.log (NOT virtio) none

Description Aron Griffis 2009-07-13 21:55:25 UTC
Created attachment 351530 [details]
kvm-tile15-idle1.qemu-kvm-spinning.strace

Description of problem:
One of my (supposedly idle) RHEL 5.4 snapshot 1 KVM guests is hung.  Running
top on the host shows that the associated qemu-kvm process is spinning.

$ ps -Fwwp 17004
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root     17004     1 45 185378 270324 24 17:03 ?       00:17:10 /usr/libexec/qemu-kvm -S -M pc -m 512 -smp 1 -name kvm-tile15-idle1 -uuid cdf00302-4df4-2717-2b6e-8bcad1c0d99c -nographic -monitor pty -pidfile /var/run/libvirt/qemu//kvm-tile15-idle1.pid -boot c -drive file=/dev/msa15/kvm-tile15-idle1-root,if=virtio,index=0,boot=on -drive file=/dev/msa15/kvm-tile15-idle1-usr,if=virtio,index=1 -net nic,macaddr=00:01:01:11:08:0f,vlan=0,model=virtio -net tap,fd=243,script=,vlan=0,ifname=vnet113 -serial pty -parallel none -usb

$ cat /var/log/libvirt/qemu/kvm-tile15-idle1.log
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/ /usr/libexec/qemu-kvm -S -M pc -m 512 -smp 1 -name kvm-tile15-idle1 -uuid cdf00302-4df4-2717-2b6e-8bcad1c0d99c -nographic -monitor pty -pidfile /var/run/libvirt/qemu//kvm-tile15-idle1.pid -boot c -drive file=/dev/msa15/kvm-tile15-idle1-root,if=virtio,index=0,boot=on -drive file=/dev/msa15/kvm-tile15-idle1-usr,if=virtio,index=1 -net nic,macaddr=00:01:01:11:08:0f,vlan=0,model=virtio -net tap,fd=243,script=,vlan=0,ifname=vnet113 -serial pty -parallel none -usb 
char device redirected to /dev/pts/229
char device redirected to /dev/pts/230

$ ping kvm-tile15-idle1
PING kvm-tile15-idle1.nashua (10.202.8.15) 56(84) bytes of data.
From octagon.nashua (10.202.2.120) icmp_seq=2 Destination Host Unreachable
From octagon.nashua (10.202.2.120) icmp_seq=3 Destination Host Unreachable
From octagon.nashua (10.202.2.120) icmp_seq=4 Destination Host Unreachable

I'll also attach an strace capture of the qemu-kvm process.

Version-Release number of selected component (if applicable):
RHEL 5.4 snapshot 1
kernel-2.6.18-156.el5.x86_64
kmod-kvm-83-82.el5.x86_64
kvm-83-82.el5.x86_64
kvm-tools-83-82.el5.x86_64
kvm-qemu-img-83-82.el5.x86_64
python-virtinst-0.400.3-4.el5.noarch
libvirt-0.6.3-13.el5.x86_64
virt-manager-0.6.1-5.el5.x86_64
virt-viewer-0.0.2-3.el5.x86_64
libvirt-python-0.6.3-13.el5.x86_64

How reproducible:
unknown

Additional info:
This is on an HP DL785 with 32 cores, 256G RAM, lots of storage.  I'm running
256 guests simultaneously, started in sequence, all idle after boot.  The bad
one is in the middle of the pack.  All the rest of the guests are running
normally.

Comment 1 Aron Griffis 2009-07-13 22:02:32 UTC
I'll leave this running overnight in case there's any further state you'd like me to capture.  Tomorrow morning I need to kill the guest to proceed with other testing.

Comment 2 Aron Griffis 2009-07-13 22:24:42 UTC
I attempted to capture a core file from the process.  You can fetch it from http://free.linux.hp.com/~agriffis/rhel5/bz511151/kvm-tile15-idle1.qemu-kvm-spinning.core.bz2

I say "attempted" because gdb emitted some warnings as it started:
[Thread debugging using libthread_db enabled]
[New Thread 0x2b51082b8f90 (LWP 17004)]

../../gdb/linux-nat.c:977: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid && WIFSTOPPED (status)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n
../../gdb/linux-nat.c:977: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid && WIFSTOPPED (status)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n

Comment 3 Aron Griffis 2009-07-14 17:12:26 UTC
Updating the "how reproducible" question.  I've seen this a few times now, about four of them yesterday.

Comment 4 Marcelo Tosatti 2009-07-14 17:51:31 UTC
Aron,

Can you capture serial console output for this guests, so we can see whats in the crashed guest console?

Also, you mentioned guests are started in sequence, can you provide more details on the exact timing ? What is the delay between starting two guests?

Comment 5 Aron Griffis 2009-07-14 22:45:04 UTC
Hi Marcelo,

Regarding the sequence, the timing is determined by when "virsh start" returns.  Right now there's some code to retry a couple times if we hit a race condition in libvirtd, see bug 511241.

If you prefer to read code...

    for g; do
        if virsh start $g || {
                warn "pausing 5 seconds before trying again"
                sleep 5
                virsh start $g
            } || {
                warn "pausing 5 seconds before trying once more"
                sleep 5
                virsh start $g
            }
        then
            echo "logfile /var/log/libvirt/qemu/$g-console.log" > /root/screenrc-tiler
            screen -c /root/screenrc-tiler -S $g-console -L -d -m virsh console $g
            continue
        fi
        die "failed to start $g"
    done

So you can see that I've added code to capture the serial console output for all the guests.  I've just started the run now.  All 256 guests are running and the associated qemu-kvm processes are normal at around 2% CPU.  Based on past experience, eventually a qemu-kvm process will spike to 100% and remain there while the guest hangs.  Gradually this will happen to more guests.

When this happens, I'll provide the associated console logs.

Comment 6 Aron Griffis 2009-07-15 18:24:22 UTC
First screen of top output, showing 5 hung guests...

+-------------------------------------------------------------------------------+
| top - 14:16:23 up  2:15,  2 users,  load average: 68.23, 74.68, 75.38         |
| Tasks: 1276 total,   1 running, 1275 sleeping,   0 stopped,   0 zombie        |
| Cpu(s):  0.1%us, 23.7%sy,  0.0%ni, 76.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st|
| Mem:  264284396k total, 124553136k used, 139731260k free, 48556476k buffers   |
| Swap:  4194296k total,        0k used,  4194296k free,   266704k cached       |
|                                                                               |
|   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          |
| 22591 root      15   0  726m 268m 2328 S 99.9  0.1  65:44.37 qemu-kvm         |
|  2061 root      15   0 1238m 279m 2328 S 99.5  0.1 119:40.70 qemu-kvm         |
| 14062 root      15   0 1238m 278m 2328 S 99.5  0.1 117:19.56 qemu-kvm         |
| 11262 root      15   0 1236m  76m 2316 S 99.2  0.0 118:25.95 qemu-kvm         |
| 12065 root      15   0 1238m 280m 2328 S 98.6  0.1  56:00.34 qemu-kvm         |
|  6130 root      15   0 1238m 279m 2328 S  3.9  0.1   1:38.05 qemu-kvm         |
| 11995 root      15   0 1238m 278m 2328 S  3.6  0.1   1:42.48 qemu-kvm         |
| 10223 root      15   0  723m 263m 2328 S  3.3  0.1   1:29.59 qemu-kvm         |
| 20829 root      15   0 1239m 280m 2328 S  3.3  0.1   1:37.21 qemu-kvm         |
|  3689 root      15   0 1237m 278m 2328 S  2.9  0.1   1:32.71 qemu-kvm         |
| 14740 root      15   0  723m 263m 2328 S  2.9  0.1   1:24.51 qemu-kvm         |
|  3788 root      15   0 1238m 279m 2328 S  2.6  0.1   1:38.78 qemu-kvm         |
|  9579 root      15   0 1239m 280m 2328 S  2.6  0.1   1:35.80 qemu-kvm         |
|  1846 root      15   0 1238m 279m 2328 S  2.3  0.1   1:25.69 qemu-kvm         |
|  4195 root      15   0 1238m 280m 2328 S  2.3  0.1   1:30.39 qemu-kvm         |
|  4926 root      15   0 1237m 279m 2328 S  2.3  0.1   1:26.70 qemu-kvm         |
| 11807 root      15   0 1238m 277m 2328 S  2.3  0.1   1:42.26 qemu-kvm         |
+-------------------------------------------------------------------------------+

I'll attach the console output from all 5.

Comment 7 Aron Griffis 2009-07-15 18:31:09 UTC
Created attachment 353868 [details]
kvm-tile1-webserver1-console.log

Comment 8 Aron Griffis 2009-07-15 18:32:13 UTC
Created attachment 353870 [details]
kvm-tile1-aim1.log

Comment 9 Aron Griffis 2009-07-15 18:32:44 UTC
Created attachment 353871 [details]
kvm-tile1-aim1.log

Comment 10 Aron Griffis 2009-07-15 18:33:15 UTC
Created attachment 353872 [details]
kvm-tile1-aim1.log

Comment 11 Aron Griffis 2009-07-15 18:35:32 UTC
Created attachment 353873 [details]
kvm-tile16-postfix1-console.log

Comment 12 Aron Griffis 2009-07-15 19:05:44 UTC
Created attachment 353882 [details]
kvm-tile17-specweb1-console.log

Comment 13 Aron Griffis 2009-07-15 19:07:01 UTC
Created attachment 353884 [details]
kvm-tile20-mysql1-console.log

Comment 14 Aron Griffis 2009-07-15 19:15:12 UTC
Created attachment 353887 [details]
kvm-tile32-idle1-console.log

Comment 15 Marcelo Tosatti 2009-07-15 21:20:25 UTC
Aron, can you attempt to reproduce with an upstream kernel host? (2.6.30)

Gleb, can you take a look at this please? Its an AMD host.

Unable to handle kernel NULL pointer dereference at 0000000000000046 RIP:
 [<0000000000000046>]
PGD 3f5d9067 PUD 3f5b8067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /block/ram0/dev
CPU 0
Modules linked in: dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 318, comm: nash-hotplug Not tainted 2.6.18-156.el5 #1
RIP: 0010:[<0000000000000046>]  [<0000000000000046>]
RSP: 0000:ffffffff8043dfa8  EFLAGS: 00010096
RAX: ffff81003f6bffd8 RBX: 0000000000000046 RCX: ffffffff8043df58
RDX: ffff810081142000 RSI: 00000000005188b0 RDI: ffffffff80492f80
RBP: 0000000000000000 R08: ffffffff80012a0c R09: ffffffff8043df98
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000005188b0 R15: 0000000000518870
FS:  0000000007bc7930(0063) GS:ffffffff803c1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000046 CR3: 000000003f5d8000 CR4: 00000000000006e0
Process nash-hotplug (pid: 318, threadinfo ffff81003f6be000, task ffff81003f5a10c0)
Stack:  00000000005188b0 ffffffff8005f2fc ffffffff8043df98 <EOI>  0000000000000000
 0000000000000046 00000000005188b0 ffffffff8005f2fc ffffffff8043df98 <EOI>
 0000000000000000 0000000000000046 00000000005188b0 ffffffff8005f2fc
Call Trace:
 <IRQ>  [<ffffffff8005f2fc>] call_softirq+0x1c/0x28
 <EOI>  [<ffffffff8005f2fc>] call_softirq+0x1c/0x28


Code:  Bad RIP value.
RIP  [<0000000000000046>]
 RSP <ffffffff8043dfa8>
CR2: 0000000000000046
 <0>Kernel panic - not syncing: Fatal exception

0xffffffff8005f2e0 <call_softirq+0>:    push   %rbp
0xffffffff8005f2e1 <call_softirq+1>:    mov    %rsp,%rbp
0xffffffff8005f2e4 <call_softirq+4>:    incl   %gs:0x28
0xffffffff8005f2ec <call_softirq+12>:   cmove  %gs:0x30,%rsp
0xffffffff8005f2f6 <call_softirq+22>:   push   %rbp
0xffffffff8005f2f7 <call_softirq+23>:   callq  0xffffffff80012983 <__do_softirq>
0xffffffff8005f2fc <call_softirq+28>:   leaveq
0xffffffff8005f2fd <call_softirq+29>:   decl   %gs:0x28
0xffffffff8005f305 <call_softirq+37>:   retq

I _guess_ for some reason RBX is crap on return from __do_softirq, so
leaveq restores a bogus RSP and retq jumps to 0000000000000046.

Note __do_softirq enables interrupts (but disables them before
returning).

Comment 16 Gleb Natapov 2009-07-16 07:54:07 UTC
Anything unusual in a host dmesg when this happens?

Comment 17 Aron Griffis 2009-07-16 11:16:35 UTC
(In reply to comment #16)
> Anything unusual in a host dmesg when this happens?  

No

Comment 18 Gleb Natapov 2009-07-16 12:31:02 UTC
Can you run these two commands in qemu monitor of the problematic VM:

info cpus
x/20i $pc-10

Comment 19 Aron Griffis 2009-07-16 13:52:58 UTC
(qemu) info cpus
* CPU #0: pc=0xffffffff8000d077 thread_id=19375
(qemu) x/20i $pc-10
0xffffffff8000d06d:  add    %al,%bl
0xffffffff8000d06f:  rdtsc  
0xffffffff8000d071:  mov    %eax,%ecx
0xffffffff8000d073:  repz nop 
0xffffffff8000d075:  rdtsc  
0xffffffff8000d077:  sub    %ecx,%eax
0xffffffff8000d079:  cmp    %rdi,%rax
0xffffffff8000d07c:  jb     0xffffffff8000d073
0xffffffff8000d07e:  retq   
0xffffffff8000d07f:  push   %r12
0xffffffff8000d081:  cmpl   $0x0,4071336(%rip)        # 0xffffffff803ef030
0xffffffff8000d088:  mov    %rsi,%r12
0xffffffff8000d08b:  push   %rbp
0xffffffff8000d08c:  mov    %rdi,%rbp
0xffffffff8000d08f:  push   %rbx
0xffffffff8000d090:  je     0xffffffff8000d0e5
0xffffffff8000d092:  lea    0x8(%rdi),%rdi
0xffffffff8000d096:  callq  0xffffffff80065a55
0xffffffff8000d09b:  mov    0x28(%rbp),%rbx
0xffffffff8000d09f:  mov    0x10(%rbx),%rax

By the way, of approximately 250 guests running on this machine, I see approximately 1-2 panics per hour.  So it's not really "the problematic VM", in fact the results I'm feeding you are typically from separate runs since I need to continue with my testing despite the problem.

Comment 20 Aron Griffis 2009-07-16 14:00:31 UTC
another one, with a slightly different $pc:

(qemu) info cpus
* CPU #0: pc=0xffffffff8000d079 thread_id=30713
(qemu) x/20i $pc-10
0xffffffff8000d069:  jmpq   0xffffffff800c9512
0xffffffff8000d06e:  retq   
0xffffffff8000d06f:  rdtsc  
0xffffffff8000d071:  mov    %eax,%ecx
0xffffffff8000d073:  repz nop 
0xffffffff8000d075:  rdtsc  
0xffffffff8000d077:  sub    %ecx,%eax
0xffffffff8000d079:  cmp    %rdi,%rax
0xffffffff8000d07c:  jb     0xffffffff8000d073
0xffffffff8000d07e:  retq   
0xffffffff8000d07f:  push   %r12
0xffffffff8000d081:  cmpl   $0x0,4071336(%rip)        # 0xffffffff803ef030
0xffffffff8000d088:  mov    %rsi,%r12
0xffffffff8000d08b:  push   %rbp
0xffffffff8000d08c:  mov    %rdi,%rbp
0xffffffff8000d08f:  push   %rbx
0xffffffff8000d090:  je     0xffffffff8000d0e5
0xffffffff8000d092:  lea    0x8(%rdi),%rdi
0xffffffff8000d096:  callq  0xffffffff80065a55
0xffffffff8000d09b:  mov    0x28(%rbp),%rbx

Comment 21 Gleb Natapov 2009-07-16 14:19:22 UTC
Both of them are at the same function and it appears to be __delay(). Are those VMs were stuck with the oops message you posted before when you retrieved this output? Did they stuck during boot (it looks like __delay() is called only during boot)?

Comment 22 Aron Griffis 2009-07-16 14:39:48 UTC
(In reply to comment #21)
> Both of them are at the same function and it appears to be __delay(). Are those
> VMs were stuck with the oops message you posted before when you retrieved this
> output?

Yes

> Did they stuck during boot (it looks like __delay() is called only
> during boot)?  

They panicked some time after boot finished, including providing the login prompt on the console.

Note that I've switched to RHEL 5.4 snapshot 2 and comments 19 and 20 are using kernel 2.6.18-157.el5. I don't know if the kernel has changed sufficiently to affect your analysis.

Comment 23 Gleb Natapov 2009-07-16 17:29:01 UTC
can you try with ide interface, just to rule virtio out.

Comment 24 Aron Griffis 2009-07-16 17:48:22 UTC
I changed the configuration and I'm starting 248 guests now, will let you know.  That's 31 tiles of 8 guests. I'm using one tile presently to continue my work.

cd /etc/libvirt/qemu
sed -i "/bus='virtio'/{s/vd/hd/;s/virtio/ide/};/model type='virtio'/d" kvm-tile{2..32}-*

Comment 25 Dor Laor 2009-07-16 20:41:02 UTC
Just to make sure, is the guest is rhel5.4 or rhel5.3?

Comment 26 Aron Griffis 2009-07-16 21:02:16 UTC
Host and guests are RHEL 5.4 snapshot 2 presently.  They were all snapshot 1 when I first filed the report.

Comment 27 Aron Griffis 2009-07-17 02:36:36 UTC
Created attachment 354079 [details]
kvm-tile17-webclient1-console.log (NOT virtio)

Happened this evening to a non-virtio guest.  Console log attached, and here's the guest xml:

$ virsh dumpxml kvm-tile17-webclient1
<domain type='kvm' id='136'>
  <name>kvm-tile17-webclient1</name>
  <uuid>00624e31-a378-7ef7-65c9-90aade99a5da</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <source dev='/dev/msa1/kvm-tile17-webclient1-root'/>
      <target dev='hda' bus='ide'/>
    </disk>
    <disk type='block' device='disk'>
      <source dev='/dev/msa1/kvm-tile17-webclient1-usr'/>
      <target dev='hdb' bus='ide'/>
    </disk>
    <disk type='block' device='disk'>
      <source dev='/dev/msa1/kvm-tile17-webclient1-swap'/>
      <target dev='hdc' bus='ide'/>
    </disk>
    <disk type='block' device='disk'>
      <source dev='/dev/msa1/kvm-tile17-webclient1-data'/>
      <target dev='hdd' bus='ide'/>
    </disk>
    <interface type='bridge'>
      <mac address='00:01:01:11:06:71'/>
      <source bridge='br0'/>
      <target dev='vnet135'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/407'/>
      <target port='0'/>
    </serial>
    <console type='pty' tty='/dev/pts/407'>
      <source path='/dev/pts/407'/>
      <target port='0'/>
    </console>
  </devices>
</domain>

Comment 28 Marcelo Tosatti 2009-07-17 13:18:57 UTC
Aron,

There has been a number of fixes to AMD's interrupt injection code in upstream, some of them which have not been backported to the RHEL codebase. I can't say for sure, but perhaps one of them has influence on the issue in question.

Since Gleb is not working today, perhaps it would be helpful if you can try to reproduce the issue with a 2.6.30 kernel installed on the host.

Comment 29 Gleb Natapov 2009-07-19 08:15:15 UTC
Can you please run this on qemu monitor after failure:
x/6x 0xffffffff803bfe60

Comment 30 Gleb Natapov 2009-07-20 12:34:01 UTC
can please try to run with kvm-88?

Comment 31 Aron Griffis 2009-07-20 14:09:05 UTC
(In reply to comment #29)
> Can you please run this on qemu monitor after failure:
> x/6x 0xffffffff803bfe60  

I will do this the next time I see a failure.

(In reply to comment #28)
> perhaps it would be helpful if you can try to
> reproduce the issue with a 2.6.30 kernel installed on the host.  

(In reply to comment #30)
> can please try to run with kvm-88?  

Regarding both of these requests, unfortunately I don't have the time to test newer bits because my time and this machine are consumed by a project related to RHEL 5.4.  However if you would like to backport patches to the RHEL 5.4 base and provide me with drop-in rpms to test, I'm willing to do that.

I'm not sure about this, but I think it should be possible to reproduce this problem on a smaller AMD box.  I think the reason I'm hitting it is the increased chance from running lots of guests.  If you can construct a setup inside RH with a couple hundred idle guests, you might hit it too.

Comment 32 Gleb Natapov 2009-07-20 14:20:30 UTC
(In reply to comment #31)
> (In reply to comment #29)
> > Can you please run this on qemu monitor after failure:
> > x/6x 0xffffffff803bfe60  
> 
> I will do this the next time I see a failure.
> 
> (In reply to comment #28)
> > perhaps it would be helpful if you can try to
> > reproduce the issue with a 2.6.30 kernel installed on the host.  
> 
> (In reply to comment #30)
> > can please try to run with kvm-88?  
> 
> Regarding both of these requests, unfortunately I don't have the time to test
> newer bits because my time and this machine are consumed by a project related
> to RHEL 5.4.  However if you would like to backport patches to the RHEL 5.4
> base and provide me with drop-in rpms to test, I'm willing to do that.
Does tar.gz with modules and qemu-kvm binary would be good enough?
 
> 
> I'm not sure about this, but I think it should be possible to reproduce this
> problem on a smaller AMD box.  I think the reason I'm hitting it is the
> increased chance from running lots of guests.  If you can construct a setup
> inside RH with a couple hundred idle guests, you might hit it too.  
I am going to do that. I don't have such huge machine here though and I may not have enough memory for a hundred guests. BTW do you run KSM?

Comment 33 Aron Griffis 2009-07-20 14:37:32 UTC
(In reply to comment #32)
> Does tar.gz with modules and qemu-kvm binary would be good enough?

Sure, but I would appreciate if you've tested them on RHEL 5.4 before I try.

>  BTW do you run KSM?  

Not intentionally. Is that available on RHEL 5.4?

Comment 34 Marcelo Tosatti 2009-07-20 17:25:20 UTC
Aron,

We will attempt to reproduce the problem internally. If you get some available time on the machine, please see if you can reproduce with kvm_amd.ko npt=0 module parameter (you should see "Nested Paging Disabled" in dmesg).

Comment 35 Gleb Natapov 2009-07-21 11:49:25 UTC
Aron I've sent you kvm modules backported to 2.6.18-157.el5 from latest kvm git.
Can you give them a try please.

Comment 36 Aron Griffis 2009-07-23 03:01:07 UTC
Marcelo, Gleb,

I was able to reproduce the problem on a second machine.  This is a ProLiant BL465c G5 with 8 Barcelona cores and 32G RAM.  The host is RHEL 5.4 snapshot 2 and I'm running 128 guests, also RHEL 5.4 snapshot 2.  The guests are idle other than minor daemon activity.

I've only seen the problem on this machine (aka barcelona) once so I'm leaving it again overnight with the base configuration to get a better handle on the probability of seeing the panic.

On the DL785 G5 (aka octagon) with 32 Barcelona cores and 256G RAM, I typically run 256 guests and I usually see between 2 and 5 guests panic overnight.  So on octagon I loaded kvm_amd with NPT disabled for tonight's run.

Depending on how things go tonight and what you advise next, I'll plan to try the backported kvm modules tomorrow.

Thanks,
Aron

Comment 37 Gleb Natapov 2009-07-23 05:10:42 UTC
Aron

I was able to reproduce the problem on my (much smaller) machine. Happens rarely though. I am looking into it. Meanwhile please try backported modules it may help to narrow the problem.

Comment 38 Aron Griffis 2009-07-24 00:21:08 UTC
(In reply to comment #36)
> I was able to reproduce the problem on a second machine.  This is a ProLiant
> BL465c G5 with 8 Barcelona cores and 32G RAM.  The host is RHEL 5.4 snapshot 2
> and I'm running 128 guests, also RHEL 5.4 snapshot 2.  The guests are idle
> other than minor daemon activity.
> 
> I've only seen the problem on this machine (aka barcelona) once so I'm leaving
> it again overnight with the base configuration to get a better handle on the
> probability of seeing the panic.

Unfortunately it only happened once in a 24 hour period, so it's hard to
consider this a useful data point.  Still I'll run the new kvm modules on it
over the weekend on barcelona to see what happens.

> On the DL785 G5 (aka octagon) with 32 Barcelona cores and 256G RAM, I typically
> run 256 guests and I usually see between 2 and 5 guests panic overnight.  So on
> octagon I loaded kvm_amd with NPT disabled for tonight's run.

I didn't see any problems running with npt=0 for about 24 hours.

I'm switching octagon now to running with the new kvm modules, same as on
barcelona.

Why is the Priority on this bug marked Low?  Shouldn't this be a RHEL 5.4 kitstopper?

Comment 39 Aron Griffis 2009-07-24 00:25:18 UTC
Gleb, note that the ksm module no longer loads now that I've updated to the kvm modules you sent me

Comment 40 Gleb Natapov 2009-07-24 03:56:53 UTC
(In reply to comment #39)
> Gleb, note that the ksm module no longer loads now that I've updated to the kvm
> modules you sent me  

This is OK. You haven't used KSM anyway.

Comment 41 Aron Griffis 2009-07-27 13:57:48 UTC
I ran both machines with the new kvm modules over the weekend. No problems

Comment 44 Aron Griffis 2009-07-30 19:07:43 UTC
Another update, I've successfully run 256 VMs on a DL785 for a few days with the updated modules with no problems.

Comment 45 Gleb Natapov 2009-07-30 19:19:21 UTC
Aron, thanks for the update. I am working on this bug but things go slow since it is very rarely reproducible for me.

Comment 46 Gleb Natapov 2009-08-03 12:35:59 UTC
Aron 

Can you try kmod-kvm-83-104.el5.x86_64.rpm I've sent you by email. It has TLB related bug fix for AMD.

Comment 47 Aron Griffis 2009-08-03 19:15:28 UTC
Hi Gleb, I'm now running kmod-kvm-83-104.el5 on RHEL 5.4 snapshot 5 with a total of 416 idle VMs on barcelona and octagon.  I will be adding a couple more machines to the mix with another couple hundred VMs, so we should know in about 24 hours if the modules do the trick.  Thanks!

Comment 51 Gleb Natapov 2009-08-05 07:20:18 UTC
Aron 

Any news (good or bad) on this?

Comment 53 Aron Griffis 2009-08-05 13:16:56 UTC
Gleb, I ran kmod-kvm-83-104.el5 on RHEL 5.4 snapshot 5 with over 600 guests on four machines for 36 hours with no problems.  I think this patch solves the bug!

Comment 54 Aron Griffis 2009-08-05 13:18:27 UTC
(clearing NEEDINFO state)

Comment 55 Dor Laor 2009-08-05 13:23:04 UTC

*** This bug has been marked as a duplicate of bug 513394 ***