Bug 867398 - Source host gets call trace and unresponsive during guest migrating
Source host gets call trace and unresponsive during guest migrating
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.4
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Juan Quintela
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-17 08:49 EDT by Qunfang Zhang
Modified: 2012-11-20 10:29 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-20 10:29:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Call trace log of host during guest migration (72.04 KB, text/plain)
2012-10-17 08:50 EDT, Qunfang Zhang
no flags Details
Host always call trace. (86.21 KB, text/plain)
2012-10-22 04:50 EDT, Qunfang Zhang
no flags Details

  None (edit)
Description Qunfang Zhang 2012-10-17 08:49:36 EDT
Description of problem:
It does not always happen but I hit it for about 5 times today. During migrating guest to another host (hit about 4 times), or local migration (hit it once), host gets call trace and has no response.

Version-Release number of selected component (if applicable):
kernel-2.6.32-331.el6.x86_64
qemu-kvm-0.12.1.2-2.327.el6.x86_64
seabios-0.6.1.2-25.el6.x86_64
spice-server-0.12.0-1.el6.x86_64

How reproducible:
Sometimes (maybe 1/20)

Steps to Reproduce:
1. Boot a guest.
The command line I used:

/usr/libexec/qemu-kvm -M rhel6.4.0 -cpu Conroe -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.4-64 -uuid feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/rhel5.9-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,drive=disk0,id=disk0,scsi=off,bus=pci.0,addr=0x3,bootindex=1 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,fd=6 6<>/dev/tap11 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:10:1A:4A:25:28,bus=pci.0,addr=0x4  -monitor stdio -qmp tcp:0:6666,server,nowait -boot c -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -drive if=none,id=drive-fdc0-0-0,readonly=on,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5  -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -spice seamless-migration=on,port=5930,password=no -global qxl-vga.vram_size=33554432 -k en-us -vga qxl

2. Boot the guest with same command line with listening mode.

3. Implement migration.
(qemu) __com.redhat_spice_migrate_info $dst_host_ip 5930
(qemu) migrate -d tcp:$dst_host_ip:5800

  
Actual results:
Sometimes host call trace and not responsible. Migration does not finish. Also no response of the qemu monitor.

Expected results:
Both host and guest work well, migration succeed.

Additional info:
Host dmesg will be upload. Actually after I execute 'dmesg' command on host, host has no response any more.
Comment 1 Qunfang Zhang 2012-10-17 08:50:18 EDT
Created attachment 628773 [details]
Call trace log of host during guest migration
Comment 2 Qunfang Zhang 2012-10-17 08:53:09 EDT
Host info:
The two hosts are not identical, but I also hit this bug once when doing local host migration on host A.

Host A (src host)

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
stepping	: 10
cpu MHz		: 2826.264
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips	: 5652.52
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:


Host B:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping	: 7
cpu MHz		: 1600.000
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6185.74
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
Comment 4 Sibiao Luo 2012-10-18 05:46:28 EDT
Hi,

  I also meet this issue, the host hang after do miration from src to dest.

host info:
# uname -r && rpm -q qemu-kvm
2.6.32-331.el6.x86_64
qemu-kvm-0.12.1.2-2.327.el6.x86_64
geust info:
# uname -r
2.6.32-331.el6.x86_64

CLI:
# /usr/libexec/qemu-kvm -M rhel6.4.0 -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -usb -device usb-tablet,id=input0 -name sluo_migration -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/mnt/RHEL-Server-6.3-64-sluo.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,fd=6 6<>/dev/tap6 -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=08:2E:5F:0A:0D:B1,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5931,disable-ticketing,seamless-migration=off -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex -device usb-ehci,id=ehci,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -nodefaults -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4444,server,nowait -boot menu=on -monitor stdio

CPU info:
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping	: 7
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6784.30
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Best Regards.
sluo
Comment 5 Qunfang Zhang 2012-10-22 04:49:00 EDT
Hit another call trace when guest is running, even don't migrate it.
Host is inaccessible any more.  Please check the attachment.
Comment 6 Qunfang Zhang 2012-10-22 04:50:29 EDT
Created attachment 631333 [details]
Host always call trace.

Not sure whether this is the same issue with bug description, please help confirm.
Comment 7 Orit Wasserman 2012-10-22 05:58:12 EDT
it looks like a different call trace,
I noticed this error in the logs:
[Firmware Warn]: Your BIOS is broken; DMAR reported at address fedc1000 returns all ones!

Maybe try to upgrade/re-install the firmware?
Comment 8 Orit Wasserman 2012-10-22 06:00:25 EDT
Another option is to disable iommu.
Comment 9 Qunfang Zhang 2012-10-22 06:14:00 EDT
(In reply to comment #7)
> it looks like a different call trace,
> I noticed this error in the logs:
> [Firmware Warn]: Your BIOS is broken; DMAR reported at address fedc1000
> returns all ones!
> 
> Maybe try to upgrade/re-install the firmware?

Hi, Orit
Do you mean to update/re-install host or kernel-firmware package?
[root@t1 ~]# uname -r
2.6.32-331.el6.x86_64
[root@t1 ~]# rpm -q kernel-firmware
kernel-firmware-2.6.32-331.el6.noarch

(In reply to comment #8)
> Another option is to disable iommu.

I don't set iommu=on on my both hosts.
# cat /proc/cmdline 
ro root=/dev/mapper/vg_t1-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_LVM_LV=vg_t1/lv_swap rd_LVM_LV=vg_t1/lv_root rd_NO_MD crashkernel=128M SYSFONT=latarcyrheb-sun16 rd_NO_DM clocksource_failover=1 console=tty0 console=ttyS0,115200nb
Comment 10 Orit Wasserman 2012-10-22 06:47:06 EDT
I meant the host firmware (Dell) but it can't hurt updating the kernel firmware.
Comment 13 Qunfang Zhang 2012-11-01 05:20:27 EDT
Update:
I re-installed my two hosts with latest RHEL6.4 tree 20121019.0, and update kernel and qemu-kvm to the following version, have not hit this issue so far.
kernel-2.6.32-337.el6.x86_64
qemu-kvm-0.12.1.2-2.331.el6.x86_64
Comment 14 Juan Quintela 2012-11-20 10:29:53 EST
As it is not reproducible anymore closing it.  If QE can reproduce it again, just re-open it.

Note You need to log in before you can comment on or make changes to this bug.