Bug 1024257 - [SVVP]Host crashed while multiple (4) guests running SVVP test on it
Summary: [SVVP]Host crashed while multiple (4) guests running SVVP test on it
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Virtualization Maintenance
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-29 09:04 UTC by Min Deng
Modified: 2013-10-29 13:07 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-29 12:49:59 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Min Deng 2013-10-29 09:04:29 UTC
Description of problem:
Host crashed while multiple guests running SVVP test on it

Version-Release number of selected component (if applicable):

Tree-snapshot4
kernel-2.6.32-425.el6.x86_64
qemu-kvm-qemu-kvm-rhev-0.12.1.2-2.415.el6.x86_64

How reproducible:
6 times

Steps to Reproduce:
1.
a./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml1.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:43:24:34:46:61 -uuid 4b610c1e-c267-4eac-929d-4a450e36443c  -monitor stdio -vnc :1 -vga cirrus -name intel-ML1 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 1_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-1.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on 
done
b./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml2.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:51:34:14:3f:61 -uuid 8d270f3e-5e01-42e1-b830-8c51c4fd0348 -monitor stdio -vnc :2 -vga cirrus -name intel-ML2 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 2_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-2.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on 
c./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml3.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:29:14:54:36:41 -uuid 69e495d4-de62-4035-a840-3e25e25562a6 -monitor stdio -vnc :3 -vga cirrus -name Intel-ML3 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 3_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-3.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on
d./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml4.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device ide-drive,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:28:32:11:37:41 -uuid 7b5b122b-7010-4ac7-a347-1476f51cea9c -monitor stdio -vnc :4 -vga cirrus -name Intel-ML4 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 4_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-4.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on  
2.Running SVVP testing on four guests at the same time.
  Such as Disk stress 
          Sleep with io


Actual results:
The host crashed

Expected results:
The test can complete successfully.

Additional info:
Guest Format:RAW
Size:120G
All the guest images store in scsi storage.

Comment 4 Min Deng 2013-10-29 09:32:13 UTC
 Assign it to kernel component firstly,feel free to change it if it is wrong.Thanks

Comment 5 Gleb Natapov 2013-10-29 12:49:59 UTC
Memory is faulty:

<4>{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 229
<4>{1}[Hardware Error]: APEI generic hardware error status
<4>{1}[Hardware Error]: severity: 2, corrected
<4>{1}[Hardware Error]: section: 0, severity: 2, corrected
<4>{1}[Hardware Error]: flags: 0x01
<4>{1}[Hardware Error]: primary
<4>{1}[Hardware Error]: section_type: memory error
<4>{1}[Hardware Error]: error_status: 0x0000000000000004
<4>{1}[Hardware Error]: physical_address: 0x0000000001a84380
<4>{1}[Hardware Error]: node: 1
<4>{1}[Hardware Error]: card: 2
<4>{1}[Hardware Error]: module: 1
<4>{1}[Hardware Error]: bank: 0
<4>{1}[Hardware Error]: row: 384
<4>{1}[Hardware Error]: column: 164
<4>{1}[Hardware Error]: error_type: 2, single-bit ECC

Comment 6 Gerd Hoffmann 2013-10-29 13:07:23 UTC
Looks like the machine has hardware problems.

The log in the vmcore (comment #2 tarball) has plenty of these:

<4>{68}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 229
<4>{68}[Hardware Error]: APEI generic hardware error status
<4>{68}[Hardware Error]: severity: 2, corrected
<4>{68}[Hardware Error]: section: 0, severity: 2, corrected
<4>{68}[Hardware Error]: flags: 0x01
<4>{68}[Hardware Error]: primary
<4>{68}[Hardware Error]: section_type: memory error
<4>{68}[Hardware Error]: error_status: 0x0000000000000004
<4>{68}[Hardware Error]: physical_address: 0x0000004053529d00
<4>{68}[Hardware Error]: node: 2
<4>{68}[Hardware Error]: card: 4
<4>{68}[Hardware Error]: module: 2
<4>{68}[Hardware Error]: bank: 3
<4>{68}[Hardware Error]: device: 4
<4>{68}[Hardware Error]: row: 21375
<4>{68}[Hardware Error]: column: 328
<4>{68}[Hardware Error]: error_type: 2, single-bit ECC

Finally this one (which is where the machine panics):

<0>[Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 5: fa00000000400405
<0>[Hardware Error]: TSC 40f9833b5ea MISC 4200 
<0>[Hardware Error]: PROCESSOR 0:206e6 TIME 1382931845 SOCKET 2 APIC 41
<0>[Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: fa00000000400405
<0>[Hardware Error]: RIP !INEXACT! 10:<ffffffff812e0f91> {intel_idle+0xb1/0x170}
<0>[Hardware Error]: TSC 40f9833afd6 MISC 4200 
<0>[Hardware Error]: PROCESSOR 0:206e6 TIME 1382931845 SOCKET 2 APIC 40
<0>[Hardware Error]: Machine check: Processor context corrupt
<0>Kernel panic - not syncing: Fatal Machine check


Note You need to log in before you can comment on or make changes to this bug.