Bug 769515

Summary: host hang after 2 kvm_VM whole night netperffing
Product: Red Hat Enterprise Linux 6 Reporter: Xiaoqing Wei <xwei>
Component: kernelAssignee: jason wang <jasowang>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: juzhang, michen, shuang, tburke
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-13 02:32:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dmesg none

Description Xiaoqing Wei 2011-12-21 06:49:38 UTC
Description of problem:

host hang after 2 kvm_VM whole night netperffing
Version-Release number of selected component (if applicable):
kernel-2.6.32-220.el6.x86_64
qemu-kvm-0.12.1.2-2.212.el6.x86_64
How reproducible:
met it twice

Steps to Reproduce:
1.boot 2 Linux guests, guest_A as netserver and guest_B as netperf client.
2.keep netperffing from guest_B to guest_A,here I am using '-t TCP_STREAM'
3.run for a long time, I tried a whole night,and host hangs
  
Actual results:
1) host hangs,cannot ssh to it, keyboard / screen didn't response.
2) but still able to reply ICMP ping.
3) kdump service is up and running but didn't capture the vmcore
Expected results:
both host and guest works well.

Additional info:
full cmd:
 qemu-kvm -monitor stdio -chardev socket,id=serial_id_20111221-141330-hqg4,path=/tmp/serial-20111221-141330-hqg4,server,nowait -device isa-serial,chardev=serial_id_20111221-141330-hqg4 -drive file=RHEL-Server-6.1-64-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idHoAlJ5,mac=9a:eb:f5:d1:2a:9a,id=ndev00idHoAlJ5,bus=pci.0,addr=0x3 -netdev tap,id=idHoAlJ5,vhost=on,fd=20 -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -spice port=8001,disable-ticketing -vga qxl -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off -no-kvm-pit-reinjection -M rhel6.2.0 -usb -device usb-tablet -enable-kvm

# brctl show
bridge name	bridge id		STP enabled	interfaces
switch		8000.00221927543b	no		eth0
							t0-141330-hqg4
							t0-183357-PUXj

Host info:
   cpu :
processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 67
model name	: Dual-Core AMD Opteron(tm) Processor 1216
stepping	: 3
cpu MHz		: 1000.000
cache size	: 1024 KB

   8G RAM

Comment 2 Dor Laor 2011-12-21 11:58:04 UTC
Can you please set up a serial console to get the message (or netconsole) from the host?

Better disable STP regardless of the issue.
Can you also test w/ Intel cpu?

Comment 3 RHEL Program Management 2011-12-21 12:00:01 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 4 Xiaoqing Wei 2011-12-22 02:19:51 UTC
(In reply to comment #2)
> Can you please set up a serial console to get the message (or netconsole) from
> the host?
OK, will do.
> 
> Better disable STP regardless of the issue.
STP is disabled.
bridge name       bridge id           STP enabled   interfaces
switch             8000.00221927543b   no           eth0

> Can you also test w/ Intel cpu?
OK, will do.

Regards,
Xiaoqing Wei.

Comment 5 Xiaoqing Wei 2011-12-25 10:12:52 UTC
(In reply to comment #2)
> Can you please set up a serial console to get the message (or netconsole) from
> the host?
> 

Test on the same host, didn't hang forever this time, only hang for a few mins and back, during the test, collected some dmesg, will attach a txt file.

> Better disable STP regardless of the issue.
> Can you also test w/ Intel cpu?

tried a intel host, after 24 hours running, didn't hang(but I am not sure if this bug is amd only, because it didn't 100% reproducible on the orig host.)

Xiaoqing Wei.

Comment 6 Xiaoqing Wei 2011-12-25 10:14:03 UTC
Created attachment 549471 [details]
dmesg