Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1024257

Summary: [SVVP]Host crashed while multiple (4) guests running SVVP test on it
Product: Red Hat Enterprise Linux 6 Reporter: Min Deng <mdeng>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.5CC: acathrow, areis, bcao, bsarathy, gleb, juzhang, michen, mkenneth, qzhang, rhod, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-29 12:49:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2013-10-29 09:04:29 UTC
Description of problem:
Host crashed while multiple guests running SVVP test on it

Version-Release number of selected component (if applicable):

Tree-snapshot4
kernel-2.6.32-425.el6.x86_64
qemu-kvm-qemu-kvm-rhev-0.12.1.2-2.415.el6.x86_64

How reproducible:
6 times

Steps to Reproduce:
1.
a./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml1.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:43:24:34:46:61 -uuid 4b610c1e-c267-4eac-929d-4a450e36443c  -monitor stdio -vnc :1 -vga cirrus -name intel-ML1 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 1_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-1.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on 
done
b./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml2.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:51:34:14:3f:61 -uuid 8d270f3e-5e01-42e1-b830-8c51c4fd0348 -monitor stdio -vnc :2 -vga cirrus -name intel-ML2 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 2_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-2.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on 
c./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml3.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:29:14:54:36:41 -uuid 69e495d4-de62-4035-a840-3e25e25562a6 -monitor stdio -vnc :3 -vga cirrus -name Intel-ML3 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 3_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-3.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on
d./usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 64G -smp 16 -cpu Nehalem -M rhel6.5.0 -usb -device usb-tablet,id=tablet0 -drive file=win2012-intel-Ml4.raw,if=none,id=drive-virtio0-0-0,format=raw,werror=stop,rerror=stop,cache=none,serial=number -device ide-drive,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:28:32:11:37:41 -uuid 7b5b122b-7010-4ac7-a347-1476f51cea9c -monitor stdio -vnc :4 -vga cirrus -name Intel-ML4 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cdrom 4_en_windows_server_2012_x64_dvd_915478.iso -boot menu=on -device usb-ehci,id=ehci0 -drive file=usb-storage-4.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on  
2.Running SVVP testing on four guests at the same time.
  Such as Disk stress 
          Sleep with io


Actual results:
The host crashed

Expected results:
The test can complete successfully.

Additional info:
Guest Format:RAW
Size:120G
All the guest images store in scsi storage.

Comment 4 Min Deng 2013-10-29 09:32:13 UTC
 Assign it to kernel component firstly,feel free to change it if it is wrong.Thanks

Comment 5 Gleb Natapov 2013-10-29 12:49:59 UTC
Memory is faulty:

<4>{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 229
<4>{1}[Hardware Error]: APEI generic hardware error status
<4>{1}[Hardware Error]: severity: 2, corrected
<4>{1}[Hardware Error]: section: 0, severity: 2, corrected
<4>{1}[Hardware Error]: flags: 0x01
<4>{1}[Hardware Error]: primary
<4>{1}[Hardware Error]: section_type: memory error
<4>{1}[Hardware Error]: error_status: 0x0000000000000004
<4>{1}[Hardware Error]: physical_address: 0x0000000001a84380
<4>{1}[Hardware Error]: node: 1
<4>{1}[Hardware Error]: card: 2
<4>{1}[Hardware Error]: module: 1
<4>{1}[Hardware Error]: bank: 0
<4>{1}[Hardware Error]: row: 384
<4>{1}[Hardware Error]: column: 164
<4>{1}[Hardware Error]: error_type: 2, single-bit ECC

Comment 6 Gerd Hoffmann 2013-10-29 13:07:23 UTC
Looks like the machine has hardware problems.

The log in the vmcore (comment #2 tarball) has plenty of these:

<4>{68}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 229
<4>{68}[Hardware Error]: APEI generic hardware error status
<4>{68}[Hardware Error]: severity: 2, corrected
<4>{68}[Hardware Error]: section: 0, severity: 2, corrected
<4>{68}[Hardware Error]: flags: 0x01
<4>{68}[Hardware Error]: primary
<4>{68}[Hardware Error]: section_type: memory error
<4>{68}[Hardware Error]: error_status: 0x0000000000000004
<4>{68}[Hardware Error]: physical_address: 0x0000004053529d00
<4>{68}[Hardware Error]: node: 2
<4>{68}[Hardware Error]: card: 4
<4>{68}[Hardware Error]: module: 2
<4>{68}[Hardware Error]: bank: 3
<4>{68}[Hardware Error]: device: 4
<4>{68}[Hardware Error]: row: 21375
<4>{68}[Hardware Error]: column: 328
<4>{68}[Hardware Error]: error_type: 2, single-bit ECC

Finally this one (which is where the machine panics):

<0>[Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 5: fa00000000400405
<0>[Hardware Error]: TSC 40f9833b5ea MISC 4200 
<0>[Hardware Error]: PROCESSOR 0:206e6 TIME 1382931845 SOCKET 2 APIC 41
<0>[Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: fa00000000400405
<0>[Hardware Error]: RIP !INEXACT! 10:<ffffffff812e0f91> {intel_idle+0xb1/0x170}
<0>[Hardware Error]: TSC 40f9833afd6 MISC 4200 
<0>[Hardware Error]: PROCESSOR 0:206e6 TIME 1382931845 SOCKET 2 APIC 40
<0>[Hardware Error]: Machine check: Processor context corrupt
<0>Kernel panic - not syncing: Fatal Machine check