867398 – Source host gets call trace and unresponsive during guest migrating

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 867398 - Source host gets call trace and unresponsive during guest migrating

Summary: Source host gets call trace and unresponsive during guest migrating

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Juan Quintela
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-17 12:49 UTC by Qunfang Zhang
Modified:	2012-11-20 15:29 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-11-20 15:29:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Call trace log of host during guest migration (72.04 KB, text/plain) 2012-10-17 12:50 UTC, Qunfang Zhang	no flags	Details
Host always call trace. (86.21 KB, text/plain) 2012-10-22 08:50 UTC, Qunfang Zhang	no flags	Details
View All

Description Qunfang Zhang 2012-10-17 12:49:36 UTC

Description of problem:
It does not always happen but I hit it for about 5 times today. During migrating guest to another host (hit about 4 times), or local migration (hit it once), host gets call trace and has no response.

Version-Release number of selected component (if applicable):
kernel-2.6.32-331.el6.x86_64
qemu-kvm-0.12.1.2-2.327.el6.x86_64
seabios-0.6.1.2-25.el6.x86_64
spice-server-0.12.0-1.el6.x86_64

How reproducible:
Sometimes (maybe 1/20)

Steps to Reproduce:
1. Boot a guest.
The command line I used:

/usr/libexec/qemu-kvm -M rhel6.4.0 -cpu Conroe -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.4-64 -uuid feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/rhel5.9-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,drive=disk0,id=disk0,scsi=off,bus=pci.0,addr=0x3,bootindex=1 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,fd=6 6<>/dev/tap11 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:10:1A:4A:25:28,bus=pci.0,addr=0x4  -monitor stdio -qmp tcp:0:6666,server,nowait -boot c -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -drive if=none,id=drive-fdc0-0-0,readonly=on,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5  -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -spice seamless-migration=on,port=5930,password=no -global qxl-vga.vram_size=33554432 -k en-us -vga qxl

2. Boot the guest with same command line with listening mode.

3. Implement migration.
(qemu) __com.redhat_spice_migrate_info $dst_host_ip 5930
(qemu) migrate -d tcp:$dst_host_ip:5800

  
Actual results:
Sometimes host call trace and not responsible. Migration does not finish. Also no response of the qemu monitor.

Expected results:
Both host and guest work well, migration succeed.

Additional info:
Host dmesg will be upload. Actually after I execute 'dmesg' command on host, host has no response any more.

Comment 1 Qunfang Zhang 2012-10-17 12:50:18 UTC

Created attachment 628773 [details]
Call trace log of host during guest migration

Comment 2 Qunfang Zhang 2012-10-17 12:53:09 UTC

Host info:
The two hosts are not identical, but I also hit this bug once when doing local host migration on host A.

Host A (src host)

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
stepping	: 10
cpu MHz		: 2826.264
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips	: 5652.52
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:


Host B:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping	: 7
cpu MHz		: 1600.000
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6185.74
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual

Comment 4 Sibiao Luo 2012-10-18 09:46:28 UTC

Hi,

  I also meet this issue, the host hang after do miration from src to dest.

host info:
# uname -r && rpm -q qemu-kvm
2.6.32-331.el6.x86_64
qemu-kvm-0.12.1.2-2.327.el6.x86_64
geust info:
# uname -r
2.6.32-331.el6.x86_64

CLI:
# /usr/libexec/qemu-kvm -M rhel6.4.0 -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -usb -device usb-tablet,id=input0 -name sluo_migration -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/mnt/RHEL-Server-6.3-64-sluo.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,fd=6 6<>/dev/tap6 -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=08:2E:5F:0A:0D:B1,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5931,disable-ticketing,seamless-migration=off -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex -device usb-ehci,id=ehci,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -nodefaults -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4444,server,nowait -boot menu=on -monitor stdio

CPU info:
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping	: 7
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6784.30
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Best Regards.
sluo

Comment 5 Qunfang Zhang 2012-10-22 08:49:00 UTC

Hit another call trace when guest is running, even don't migrate it.
Host is inaccessible any more.  Please check the attachment.

Comment 6 Qunfang Zhang 2012-10-22 08:50:29 UTC

Created attachment 631333 [details]
Host always call trace.

Not sure whether this is the same issue with bug description, please help confirm.

Comment 7 Orit Wasserman 2012-10-22 09:58:12 UTC

it looks like a different call trace,
I noticed this error in the logs:
[Firmware Warn]: Your BIOS is broken; DMAR reported at address fedc1000 returns all ones!

Maybe try to upgrade/re-install the firmware?

Comment 8 Orit Wasserman 2012-10-22 10:00:25 UTC

Another option is to disable iommu.

Comment 9 Qunfang Zhang 2012-10-22 10:14:00 UTC

(In reply to comment #7)
> it looks like a different call trace,
> I noticed this error in the logs:
> [Firmware Warn]: Your BIOS is broken; DMAR reported at address fedc1000
> returns all ones!
> 
> Maybe try to upgrade/re-install the firmware?

Hi, Orit
Do you mean to update/re-install host or kernel-firmware package?
[root@t1 ~]# uname -r
2.6.32-331.el6.x86_64
[root@t1 ~]# rpm -q kernel-firmware
kernel-firmware-2.6.32-331.el6.noarch

(In reply to comment #8)
> Another option is to disable iommu.

I don't set iommu=on on my both hosts.
# cat /proc/cmdline 
ro root=/dev/mapper/vg_t1-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_LVM_LV=vg_t1/lv_swap rd_LVM_LV=vg_t1/lv_root rd_NO_MD crashkernel=128M SYSFONT=latarcyrheb-sun16 rd_NO_DM clocksource_failover=1 console=tty0 console=ttyS0,115200nb

Comment 10 Orit Wasserman 2012-10-22 10:47:06 UTC

I meant the host firmware (Dell) but it can't hurt updating the kernel firmware.

Comment 13 Qunfang Zhang 2012-11-01 09:20:27 UTC

Update:
I re-installed my two hosts with latest RHEL6.4 tree 20121019.0, and update kernel and qemu-kvm to the following version, have not hit this issue so far.
kernel-2.6.32-337.el6.x86_64
qemu-kvm-0.12.1.2-2.331.el6.x86_64

Comment 14 Juan Quintela 2012-11-20 15:29:53 UTC

As it is not reproducible anymore closing it.  If QE can reproduce it again, just re-open it.

Note You need to log in before you can comment on or make changes to this bug.