Bug 597144

Summary: VM reboot automatically when run multi VM which is loaded
Product: Red Hat Enterprise Linux 5 Reporter: Golita Yue <gyue>
Component: kvmAssignee: Andrea Arcangeli <aarcange>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.6CC: aarcange, gcosta, jwest, lihuang, llim, michen, ndai, virt-maint, zamsden
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-04 14:14:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580949    
Attachments:
Description Flags
debug info
none
dump file none

Description Golita Yue 2010-05-28 08:52:44 UTC
Created attachment 417522 [details]
debug info

Description of problem:
When I run 6 VMs on the same host and activate CPU load on them. when 
the Host CPU reach high utilization (>90%), the three of them reboot automaticlly.
And the "unexpected shutdown" information note display after VM reboot.
 
Version-Release number of selected component (if applicable):
kvm-83-164.el5_5.9
kernel: 2.6.18-194.3.1.el5
rhev-hypervisor-5.5-2.2.0.16.1
sm69

How reproducible:
1/1

Steps to Reproduce:
1. install win7_x86 from rhev-M
2. make template of win7_x86
3. New 7 VM based on template of win7_x86 
4. load host by script
 for(( I=0; I<`cat /proc/cpuinfo  | grep processor | wc -l`;I++)) ; do echo $I; taskset -c $I /bin/bash -c 'for ((;;));  do X=1; done &'  ; done
5. select 6 VM by press Shift  
   (rhev-M alert me cannot run the 7th VM, maybe you can run more VM )
6. press Run button
7. wait VM answer to ping, and run CPU_burn-in 30 min

Actual results:
Unexpected shutdown occurred and three VM reboot automatically

Expected results:
All VM can finish CPU_burn-in testing

Additional info:
debug info; dump file please refer to attachment.

cmd:
/usr/libexec/qemu-kvm -no-hpet -usb -rtc-td-hack -startdate 2010-05-28T01:22:38 -name win7_nfs_s3 -smp 4,cores=1 -k en-us -m 1024 -boot cd -net nic,vlan=1,macaddr=00:1a:4a:42:41:1c,model=rtl8139 -net tap,vlan=1,ifname=rtl8139_13_1,script=no -drive file=/rhev/data-center/b8a6bc1d-7935-4129-9b7a-483906949cc3/23c959c9-ea7d-4468-b308-f3e1cb04b345/images/3e90f6ee-62bd-40cb-9920-aae369ded9ab/20c245b0-0cdc-40c8-91cb-29f5edf5c8b7,media=disk,if=ide,cache=off,index=0,serial=cb-9920-aae369ded9ab,boot=off,format=qcow2,werror=stop -pidfile /var/vdsm/1ccd51ef-3e04-49fc-8132-912ef93f9090.pid -vnc 0:13,password -cpu qemu64,+sse2,+cx16,+ssse3 -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-0.16.1,serial=009A33EA-2514-DF11-874D-9EA3C4859730_6c:f0:49:27:33:32,uuid=1ccd51ef-3e04-49fc-8132-912ef93f9090 -vmchannel di:0200,unix:/var/vdsm/1ccd51ef-3e04-49fc-8132-912ef93f9090.guest.socket,server -monitor unix:/var/vdsm/1ccd51ef-3e04-49fc-8132-912ef93f9090.monitor.socket,server

Top Result:
top - 05:36:32 up 22:47,  1 user,  load average: 16.64, 14.99, 13.83
Tasks: 147 total,   7 running, 140 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.4%us, 96.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   7758560k total,  5079540k used,  2679020k free,    62980k buffers
Swap:  8073208k total,    81116k used,  7992092k free,  1283688k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                           
 2504 vdsm      15   0 1251m 1.0g 577m S 96.1 13.8  61:39.26 qemu-kvm                                                                                                                                          
11020 vdsm      15   0 1257m 1.0g 730m S 93.2 13.6  18:13.03 qemu-kvm                                                                                                                                          
 2665 vdsm      15   0 1251m 1.0g 652m S 52.5 13.8  64:39.53 qemu-kvm                                                                                                                                          
 2136 root      15   0     0    0    0 R 44.0  0.0  16:36.27 kksmd                                                                                                                                             
 2344 vdsm      15   0 1251m 1.0g 438m S 36.7 13.8  58:48.14 qemu-kvm                                                                                                                                          
 2444 vdsm      15   0 1251m 1.0g 579m R 30.8 13.8  63:02.52 qemu-kvm                                                                                                                                          
 2978 root      25   0  8668  532  380 R 17.7  0.0  27:59.13 bash                                                                                                                                              
 2972 root      25   0  8668  536  380 R 13.5  0.0  28:24.64 bash                                                                                                                                              
 2747 vdsm      15   0 1255m 1.0g 733m S 10.8 13.8  61:43.85 qemu-kvm                                                                                                                                          
 2984 root      25   0  8668  532  380 R  3.0  0.0  27:14.06 bash                                                                                                                                              
 7085 vdsm      10  -5  520m  15m 3036 S  1.0  0.2   4:46.23 vdsm     

Host information:
cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 2659.988
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5319.97
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 2659.988
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5319.94
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 2659.988
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5320.01
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 2659.988
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 5319.99
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Comment 1 Golita Yue 2010-05-28 08:54:30 UTC
Created attachment 417523 [details]
dump file

Comment 3 Glauber Costa 2010-05-28 19:45:15 UTC
Is this behaviour exclusive of windows guest?

Does it happen in an all-linux scenario? Mixed scenario?

Thanks

Comment 4 Golita Yue 2010-06-03 05:47:46 UTC
(In reply to comment #3)
> Is this behaviour exclusive of windows guest?
> 
> Does it happen in an all-linux scenario? Mixed scenario?
> 
> Thanks    

I started 6 linux VMs and run about 2 hours, didn't happen reboot.

Comment 5 Zachary Amsden 2010-06-22 21:07:19 UTC
This sounds like a memory corruption or other catastrophic failure, not a kvmclock bug.

Comment 6 Glauber Costa 2010-06-23 13:26:01 UTC
Indeed. Since it happens in a Windows-only environment, it is highly unlikely that kvmclock plays a role here.

Comment 7 Andrea Arcangeli 2010-07-05 16:02:09 UTC
do you get any swap on host? can you try to swapoff -a on host?

Comment 15 RHEL Program Management 2011-01-11 20:53:59 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 16 RHEL Program Management 2011-01-11 22:51:47 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 19 Andrea Arcangeli 2011-10-04 14:14:57 UTC
CPU_burn-in 30 min on 6VM with 4cpus means some will not get enough CPU time as in real hardware. This may lead to irqs being delivered with delay, if windows7 reboots if an apic irq or nmi arrives late, this doesn't seem a kvm bug but a tweak would be needed in w7 to stop rebooting. A similar scenario would happen by enabling the nmi watchdog with linux guest.

There wasn't enough info to debug so I guess we can close it considering also it doesn't seem an obvious kvm bug, we can't give more cpu to guest than what's available on the hardware, some preemption and delays will happen with cpu overcommitting.

I'm closing as a notabug for now as it isn't certain this is a kvm bug. The kvm clock has to still try to report the real time even if there are preemption delays hence potentially triggering things like the nmi watchdog, guest should be able to cope with that to be stable.