Hide Forgot
Description of problem: Host crashed while multiple guests running SVVP test on it Version-Release number of selected component (if applicable): Tree-snapshot4 kernel-2.6.32-425.el6.x86_64 qemu-kvm-qemu-kvm-rhev-0.12.1.2-2.415.el6.x86_64 How reproducible: 6 times Steps to Reproduce: 1. a./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml1.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:43:24:34:46:61 -uuid 4b610c1e-c267-4eac-929d-4a450e36443c -monitor stdio -vnc :1 -vga cirrus -name intel-ML1 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 1_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-1.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on done b./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml2.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:51:34:14:3f:61 -uuid 8d270f3e-5e01-42e1-b830-8c51c4fd0348 -monitor stdio -vnc :2 -vga cirrus -name intel-ML2 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 2_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-2.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on c./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml3.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:29:14:54:36:41 -uuid 69e495d4-de62-4035-a840-3e25e25562a6 -monitor stdio -vnc :3 -vga cirrus -name Intel-ML3 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 3_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-3.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on d./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml4.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device ide-drive,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:28:32:11:37:41 -uuid 7b5b122b-7010-4ac7-a347-1476f51cea9c -monitor stdio -vnc :4 -vga cirrus -name Intel-ML4 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 4_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-4.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on 2.Running SVVP testing on four guests at the same time. Such as Disk stress Sleep with io Actual results: The host crashed Expected results: The test can complete successfully. Additional info: Guest Format:RAW Size:120G All the guest images store in scsi storage.
Assign it to kernel component firstly,feel free to change it if it is wrong.Thanks
Memory is faulty: <4>{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 229 <4>{1}[Hardware Error]: APEI generic hardware error status <4>{1}[Hardware Error]: severity: 2, corrected <4>{1}[Hardware Error]: section: 0, severity: 2, corrected <4>{1}[Hardware Error]: flags: 0x01 <4>{1}[Hardware Error]: primary <4>{1}[Hardware Error]: section_type: memory error <4>{1}[Hardware Error]: error_status: 0x0000000000000004 <4>{1}[Hardware Error]: physical_address: 0x0000000001a84380 <4>{1}[Hardware Error]: node: 1 <4>{1}[Hardware Error]: card: 2 <4>{1}[Hardware Error]: module: 1 <4>{1}[Hardware Error]: bank: 0 <4>{1}[Hardware Error]: row: 384 <4>{1}[Hardware Error]: column: 164 <4>{1}[Hardware Error]: error_type: 2, single-bit ECC
Looks like the machine has hardware problems. The log in the vmcore (comment #2 tarball) has plenty of these: <4>{68}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 229 <4>{68}[Hardware Error]: APEI generic hardware error status <4>{68}[Hardware Error]: severity: 2, corrected <4>{68}[Hardware Error]: section: 0, severity: 2, corrected <4>{68}[Hardware Error]: flags: 0x01 <4>{68}[Hardware Error]: primary <4>{68}[Hardware Error]: section_type: memory error <4>{68}[Hardware Error]: error_status: 0x0000000000000004 <4>{68}[Hardware Error]: physical_address: 0x0000004053529d00 <4>{68}[Hardware Error]: node: 2 <4>{68}[Hardware Error]: card: 4 <4>{68}[Hardware Error]: module: 2 <4>{68}[Hardware Error]: bank: 3 <4>{68}[Hardware Error]: device: 4 <4>{68}[Hardware Error]: row: 21375 <4>{68}[Hardware Error]: column: 328 <4>{68}[Hardware Error]: error_type: 2, single-bit ECC Finally this one (which is where the machine panics): <0>[Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 5: fa00000000400405 <0>[Hardware Error]: TSC 40f9833b5ea MISC 4200 <0>[Hardware Error]: PROCESSOR 0:206e6 TIME 1382931845 SOCKET 2 APIC 41 <0>[Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: fa00000000400405 <0>[Hardware Error]: RIP !INEXACT! 10:<ffffffff812e0f91> {intel_idle+0xb1/0x170} <0>[Hardware Error]: TSC 40f9833afd6 MISC 4200 <0>[Hardware Error]: PROCESSOR 0:206e6 TIME 1382931845 SOCKET 2 APIC 40 <0>[Hardware Error]: Machine check: Processor context corrupt <0>Kernel panic - not syncing: Fatal Machine check