Description of problem: It's unable to ping the virtual machine after kvm live migration Version-Release number of selected component (if applicable): RHEL 5.4 How reproducible: Often Steps to Reproduce: 1. Install KVM on host "labrador" and host "sspc-2" 2. Configure eth0 as bridge and remove the defalut network on both stands 3. Install a virtual machine "RHEL54" on labrador (step 1) 4. On "labrador" migrate "RHEL54" to "sspc-2" (step 2) 5. After the migration it's unable to ping the virtual machine "RHEL54" , also can't do any operations from the console (step 3) Actual results: Can't ping the virtual machine after kvm live migration Expected results: Should ping the virtual machine after kvm live migration Additional info: If reboot the virtual machine on "sspc-2" again , and migrate it back to "labrador" it's able to ping the virtual machine "RHEL54"
Created attachment 362979 [details] steps
Frank, Which network card are you emulating? I've been testing virtio and e1000, and have repeatedly reproduced the case where the console becomes unresponsive when the network dies only on the latter. If you don't know how your VM is configured, you might consider adding the output of virsh dumpxml as an attachment.
Created attachment 367560 [details] dumpxml
Hey Charles, Here're the network card info from 2 stands . ------------------------------------------------ [root@labrador ~]# lspci|grep Ethernet // The hardware is IBM X System 3655 39:02.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) 39:03.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) [root@sspc-2 ~]# lspci|grep Ethernet 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) 06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) I've attached the xml file . Thanks
Frank, I was actually wondering about the *emulated* network card within the guest, not the physical hardware in the host. The dumpxml you provided doesn't specify a model, and I don't have a stock RHEL5.4 system handy at the moment to determine the default experimentally. More to the point -- if running lspci *within the guest* shows that you're using something other than virtio (which will show up as a Red Hat / Qumranet device), could you try switching emulated devices and seeing if that changes the behavior you're seeing? Edit your domain XML with "virsh edit", add the element <model type='virtio'/> inside your <interface> element, and restart the guest. If the issue is reproducible with this change made, does it also retain the behavior of the guest's console being nonresponsive? Thanks!
Charles, sorry I misunderstood your comments before ,here is the info from guest RHEL54 ---------------------------------- [root@tbui ~]# lspci|grep Ethernet 00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20) And I've added the "<model type='virtio'/>" to the domain XML as you said , and after reboot it then to migrate it , but still kept the same behavior as before , unable to ping and console's nonresponsive .
And another issue , after migrate RHEL54 from labrador to sspc-2 , the console's no response while I'm inputting some keys . Then I migrate it back to host labrador , everything gets normal and also can see the keys that I've input before in the console .
Sounds like a separate issue from what I'm seeing, then. Apologies for intruding on your ticket.
Some questions: 1. Spanning tree - STP should be set to off on the bridges of the host. Otherwise you'll lose networking during the learning time. 2. What's the kvm rpm version? The latest update is kvm-83-105.el5_4.9 What's the host kernel version? 3. Are you using vnc for getting the guest's console? Can you use vncviewer directly? 4. What are the versions of the hosts? Can you send 'cat /proc/cpuinfo' of them? 5. Did virt-mngr reported successful migration? 6. Can we get the output of strace `pgrep qemu` on the destination?
Created attachment 370298 [details] 1.png
Created attachment 370299 [details] 2.png
Thanks for the screen shots, what the other questions
Created attachment 370302 [details] strace
Hi Dor, FYI 1. Sorry but could you show me how to check the STP status on the host ? [root@labrador ~]# ifconfig br0 br0 Link encap:Ethernet HWaddr 00:14:5E:5B:5D:B6 inet addr:9.125.52.14 Bcast:9.125.52.255 Mask:255.255.255.0 inet6 addr: 2002:97b:c7ab:2009:214:5eff:fe5b:5db6/64 Scope:Global inet6 addr: 2002:97b:c7ab:2008:214:5eff:fe5b:5db6/64 Scope:Global inet6 addr: fe80::214:5eff:fe5b:5db6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:45390555 errors:0 dropped:0 overruns:0 frame:0 TX packets:19124960 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:10987839996 (10.2 GiB) TX bytes:8631291540 (8.0 GiB) 2.kvm rpm version is 83-105.el5 [root@labrador ~]# rpm -qa|grep kvm kvm-tools-83-105.el5 kmod-kvm-83-105.el5 etherboot-zroms-kvm-5.4.4-10.el5 kvm-qemu-img-83-105.el5 kvm-83-105.el5 [root@labrador ~]# uname -a Linux labrador 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux 3. Yes, I configured the virtual machine to run vncserver (hostname's RHEL54 before) [root@tbui ~]# netstat -an|grep 5901 tcp 0 0 0.0.0.0:5901 0.0.0.0:* LISTEN [root@tbui ~]# /etc/init.d/vncserver status Xvnc (pid 1847) is running... 4.Do you mean the os type ? If yes then both host labrador and sspc-2 are installed as RHEL5.4 , also is the same with virtual machine From host labrador ------------------ [root@labrador ~]# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2352 stepping : 3 cpu MHz : 1100.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4200.35 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 1 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2352 stepping : 3 cpu MHz : 1100.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4199.57 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 2 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2352 stepping : 3 cpu MHz : 1100.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4199.53 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2352 stepping : 3 cpu MHz : 1100.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4199.58 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] From host sspc-2 ---------------- [root@sspc-2 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping : 6 cpu MHz : 3698.056 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 7396.11 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping : 6 cpu MHz : 3698.056 cache size : 6144 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4987.48 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping : 6 cpu MHz : 3698.056 cache size : 6144 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4987.50 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping : 6 cpu MHz : 3698.056 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 4987.48 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: 5.After migration there is no message reports like migrate sucessfully Check the attachments (1.png & 2.png) 6. After migreated RHEL54 to sspc-2 [root@sspc-2 ~]# virsh list Id Name State ---------------------------------- 10 RHEL54 running [root@sspc-2 ~]# strace pgrep qemu >& strace This output file has been added as a attachment since it's a litte bit long
* Cross vendor migration is not supported (amd-Intel) * So please use the same hw on src/dst * there is no point in further research for this bug. * For completeness I do answer the other questions: * brctl show provides the STP info * Since your vncserver is in the guest, loss of networking causes the entire guest to be unresponsive. You can use the vnc to qemu from the host. Please retest, otherwise it is not a bug
> steps to Reproduce: > 1. Install KVM on host "labrador" and host "sspc-2" > 2. Configure eth0 as bridge and remove the defalut network on both stands What do you mean here? My networking here is: [root@deus ~]# ifconfig br0 br0 Link encap:Ethernet HWaddr 00:16:76:D6:FC:76 inet addr:192.168.10.223 Bcast:192.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::216:76ff:fed6:fc76/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2867 errors:0 dropped:0 overruns:0 frame:0 TX packets:2875 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:347651 (339.5 KiB) TX bytes:3033569 (2.8 MiB) [root@deus ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:16:76:D6:FC:76 inet6 addr: fe80::216:76ff:fed6:fc76/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2932 errors:0 dropped:0 overruns:0 frame:0 TX packets:4076 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:393553 (384.3 KiB) TX bytes:3211907 (3.0 MiB) Memory:90400000-90420000 [root@deus ~]# brctl show bridge name bridge id STP enabled interfaces br0 8000.001676d6fc76 no eth0 virbr0 8000.000000000000 yes (I use br0 for guest networking, virbr0 is j) > 3. Install a virtual machine "RHEL54" on labrador (step 1) > 4. On "labrador" migrate "RHEL54" to "sspc-2" (step 2) > 5. After the migration it's unable to ping the virtual machine "RHEL54" , also can't do any operations from the console (step 3) from virt-manager, I got console also stopped. I just tried also to do the migration using: virsh migrate --live <...> using ssh channel and it also fails. I copied qemu command line, changed the networknig options (form fd=<number> to script=/etc/kvm-ifup && added -monitor stdio). And it also failed .... further investigation showed that typing "cont" on the monitor fixed the issue. Contacting libvirt people to see if they have any idea.
Sorry I didn't describe it clearly . I mean I configure one eth0 into bridge mode , and remove the default network that created by kvm , here is the info , basically no big difference from yours . [root@labrador ~]# ifconfig br0 Link encap:Ethernet HWaddr 00:14:5E:5B:5D:B6 inet addr:9.125.52.14 Bcast:9.125.52.255 Mask:255.255.255.0 inet6 addr: 2002:97b:c2c1:806:214:5eff:fe5b:5db6/64 Scope:Global inet6 addr: 2002:97b:c7ab:2009:214:5eff:fe5b:5db6/64 Scope:Global inet6 addr: 2002:97b:c7ab:2008:214:5eff:fe5b:5db6/64 Scope:Global inet6 addr: fe80::214:5eff:fe5b:5db6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:15902296 errors:0 dropped:0 overruns:0 frame:0 TX packets:766774 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2595335455 (2.4 GiB) TX bytes:683299393 (651.6 MiB) eth0 Link encap:Ethernet HWaddr 00:14:5E:5B:5D:B6 inet6 addr: fe80::214:5eff:fe5b:5db6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:24167463 errors:0 dropped:10936037 overruns:0 frame:0 TX packets:1167196 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4452886723 (4.1 GiB) TX bytes:714691518 (681.5 MiB) Interrupt:90 Memory:e2000000-e2012800 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1432 errors:0 dropped:0 overruns:0 frame:0 TX packets:1432 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2525826 (2.4 MiB) TX bytes:2525826 (2.4 MiB)
Bug is already fixed in: libvirt-0.6.3-24.el5 has been built in dist-5E-qu-candidate with the fixes that is in QE. This is a duplicate of the bug 519204. *** This bug has been marked as a duplicate of bug 519204 ***