Bug 526206 - It's unable to ping after kvm live migration
Summary: It's unable to ping after kvm live migration
Keywords:
Status: CLOSED DUPLICATE of bug 519204
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Juan Quintela
QA Contact: Lawrence Lim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-29 09:33 UTC by frank
Modified: 2014-03-26 01:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-17 11:52:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
steps (103.92 KB, image/png)
2009-09-29 09:35 UTC, frank
no flags Details
dumpxml (1.21 KB, application/xml)
2009-11-05 02:58 UTC, frank
no flags Details
1.png (299.03 KB, image/png)
2009-11-19 08:45 UTC, frank
no flags Details
2.png (371.07 KB, image/png)
2009-11-19 08:47 UTC, frank
no flags Details
strace (58.08 KB, text/plain)
2009-11-19 09:32 UTC, frank
no flags Details

Description frank 2009-09-29 09:33:07 UTC
Description of problem:
It's unable to ping the virtual machine after kvm live migration 

Version-Release number of selected component (if applicable):
RHEL 5.4

How reproducible:
Often

Steps to Reproduce:
1. Install KVM on host "labrador" and host "sspc-2"
2. Configure eth0 as bridge and remove the defalut network on both stands
3. Install a virtual machine "RHEL54" on labrador (step 1)
4. On "labrador" migrate "RHEL54" to "sspc-2" (step 2)
5. After the migration it's unable to ping the virtual machine "RHEL54" , also can't do any operations from the console (step 3)
  
Actual results:
Can't ping the virtual machine after kvm live migration

Expected results:
Should ping the virtual machine after kvm live migration

Additional info:
If reboot the virtual machine on "sspc-2" again , and migrate it back to "labrador" it's able to ping the virtual machine "RHEL54"

Comment 1 frank 2009-09-29 09:35:21 UTC
Created attachment 362979 [details]
steps

Comment 2 Charles Duffy 2009-11-02 13:35:42 UTC
Frank,

Which network card are you emulating?

I've been testing virtio and e1000, and have repeatedly reproduced the case where the console becomes unresponsive when the network dies only on the latter.

If you don't know how your VM is configured, you might consider adding the output of virsh dumpxml as an attachment.

Comment 3 frank 2009-11-05 02:58:42 UTC
Created attachment 367560 [details]
dumpxml

Comment 4 frank 2009-11-05 03:01:06 UTC
Hey Charles,
    Here're the network card info from 2 stands .
------------------------------------------------
[root@labrador ~]# lspci|grep Ethernet   // The hardware is IBM X System 3655
39:02.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02)
39:03.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02)

[root@sspc-2 ~]# lspci|grep Ethernet  
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)

I've attached the xml file . Thanks

Comment 5 Charles Duffy 2009-11-06 09:17:15 UTC
Frank,

I was actually wondering about the *emulated* network card within the guest, not the physical hardware in the host. The dumpxml you provided doesn't specify a model, and I don't have a stock RHEL5.4 system handy at the moment to determine the default experimentally.

More to the point -- if running lspci *within the guest* shows that you're using something other than virtio (which will show up as a Red Hat / Qumranet device), could you try switching emulated devices and seeing if that changes the behavior you're seeing? Edit your domain XML with "virsh edit", add the element <model type='virtio'/> inside your <interface> element, and restart the guest.

If the issue is reproducible with this change made, does it also retain the behavior of the guest's console being nonresponsive?

Thanks!

Comment 6 frank 2009-11-10 06:36:07 UTC
Charles, sorry I misunderstood your comments before ,here is the info from guest RHEL54
----------------------------------
[root@tbui ~]# lspci|grep Ethernet
00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)

And I've added the "<model type='virtio'/>" to the domain XML as you said , and after reboot it then to migrate it , but still kept the same behavior as before , unable to ping and console's nonresponsive .

Comment 7 frank 2009-11-10 06:39:28 UTC
And another issue , after migrate RHEL54 from labrador to sspc-2 , the console's no response while I'm inputting some keys . Then I migrate it back to host labrador , everything gets normal and also can see the keys that I've input before in the console .

Comment 8 Charles Duffy 2009-11-10 06:44:00 UTC
Sounds like a separate issue from what I'm seeing, then. Apologies for intruding on your ticket.

Comment 9 Dor Laor 2009-11-18 14:08:27 UTC
Some questions:
1. Spanning tree - STP should be set to off on the bridges of the host.
   Otherwise you'll lose networking during the learning time.
2. What's the kvm rpm version? The latest update is kvm-83-105.el5_4.9
   What's the host kernel version?
3. Are you using vnc for getting the guest's console?
   Can you use vncviewer directly?
4. What are the versions of the hosts? Can you send 'cat /proc/cpuinfo' of them?
5. Did virt-mngr reported successful migration?
6. Can we get the output of strace `pgrep qemu` on the destination?

Comment 10 frank 2009-11-19 08:45:40 UTC
Created attachment 370298 [details]
1.png

Comment 11 frank 2009-11-19 08:47:29 UTC
Created attachment 370299 [details]
2.png

Comment 12 Dor Laor 2009-11-19 09:19:29 UTC
Thanks for the screen shots, what the other questions

Comment 13 frank 2009-11-19 09:32:47 UTC
Created attachment 370302 [details]
strace

Comment 14 frank 2009-11-19 09:35:55 UTC
Hi Dor, FYI 
1. Sorry but could you show me how to check the STP status on the host ?
[root@labrador ~]# ifconfig br0
br0       Link encap:Ethernet  HWaddr 00:14:5E:5B:5D:B6  
          inet addr:9.125.52.14  Bcast:9.125.52.255  Mask:255.255.255.0
          inet6 addr: 2002:97b:c7ab:2009:214:5eff:fe5b:5db6/64 Scope:Global
          inet6 addr: 2002:97b:c7ab:2008:214:5eff:fe5b:5db6/64 Scope:Global
          inet6 addr: fe80::214:5eff:fe5b:5db6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:45390555 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19124960 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:10987839996 (10.2 GiB)  TX bytes:8631291540 (8.0 GiB)

2.kvm rpm version is 83-105.el5
[root@labrador ~]# rpm -qa|grep kvm
kvm-tools-83-105.el5
kmod-kvm-83-105.el5
etherboot-zroms-kvm-5.4.4-10.el5
kvm-qemu-img-83-105.el5
kvm-83-105.el5
[root@labrador ~]# uname -a
Linux labrador 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

3. Yes, I configured the virtual machine to run vncserver (hostname's RHEL54 before)
[root@tbui ~]# netstat -an|grep 5901
tcp        0      0 0.0.0.0:5901                0.0.0.0:*                   LISTEN      
[root@tbui ~]# /etc/init.d/vncserver status
Xvnc (pid 1847) is running...

4.Do you mean the os type ? If yes then both host labrador and sspc-2 are installed as RHEL5.4 , also is the same with virtual machine 
From host labrador
------------------
[root@labrador ~]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
cpu MHz		: 1100.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
bogomips	: 4200.35
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]

processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
cpu MHz		: 1100.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
bogomips	: 4199.57
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]

processor	: 2
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
cpu MHz		: 1100.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
bogomips	: 4199.53
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]

processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
cpu MHz		: 1100.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
bogomips	: 4199.58
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]

From host sspc-2
----------------
[root@sspc-2 ~]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping	: 6
cpu MHz		: 3698.056
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips	: 7396.11
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping	: 6
cpu MHz		: 3698.056
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips	: 4987.48
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping	: 6
cpu MHz		: 3698.056
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips	: 4987.50
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping	: 6
cpu MHz		: 3698.056
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips	: 4987.48
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

5.After migration there is no message reports like migrate sucessfully
Check the attachments (1.png & 2.png)

6. After migreated RHEL54 to sspc-2
[root@sspc-2 ~]# virsh list
 Id Name                 State
----------------------------------
 10 RHEL54               running

[root@sspc-2 ~]# strace pgrep qemu >& strace

This output file has been added as a attachment since it's a litte bit long

Comment 15 Dor Laor 2009-11-19 12:43:49 UTC
* Cross vendor migration is not supported (amd-Intel)
  * So please use the same hw on src/dst
  * there is no point in further research for this bug.
  * For completeness I do answer the other questions:
* brctl show provides the STP info
* Since your vncserver is in the guest, loss of networking causes the entire guest to be unresponsive. You can use the vnc to qemu from the host.

Please retest, otherwise it is not a bug

Comment 17 Juan Quintela 2009-12-17 00:05:22 UTC
> steps to Reproduce:
> 1. Install KVM on host "labrador" and host "sspc-2"
> 2. Configure eth0 as bridge and remove the defalut network on both stands

What do you mean here?  My networking here is:

[root@deus ~]# ifconfig br0
br0       Link encap:Ethernet  HWaddr 00:16:76:D6:FC:76  
          inet addr:192.168.10.223  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6 addr: fe80::216:76ff:fed6:fc76/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2867 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2875 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:347651 (339.5 KiB)  TX bytes:3033569 (2.8 MiB)
[root@deus ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:16:76:D6:FC:76  
          inet6 addr: fe80::216:76ff:fed6:fc76/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2932 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4076 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:393553 (384.3 KiB)  TX bytes:3211907 (3.0 MiB)
          Memory:90400000-90420000 

[root@deus ~]# brctl show
bridge name	bridge id		STP enabled	interfaces
br0		8000.001676d6fc76	no		eth0
virbr0		8000.000000000000	yes		

(I use br0 for guest networking, virbr0 is j)

> 3. Install a virtual machine "RHEL54" on labrador (step 1)
> 4. On "labrador" migrate "RHEL54" to "sspc-2" (step 2)
> 5. After the migration it's unable to ping the virtual machine "RHEL54" , also
can't do any operations from the console (step 3)

from virt-manager, I got console also stopped.  I just tried also to do the migration using:

virsh migrate --live <...> using ssh channel and it also fails.

I copied qemu command line, changed the networknig options (form fd=<number> to script=/etc/kvm-ifup && added -monitor stdio).  And it also failed ....
further investigation showed that typing "cont" on the monitor fixed the issue.

Contacting libvirt people to see if they have any idea.

Comment 18 frank 2009-12-17 03:18:18 UTC
Sorry I didn't describe it clearly . I mean I configure one eth0 into bridge mode , and remove the default network that created by kvm , here is the info , basically no big difference from yours .
[root@labrador ~]# ifconfig
br0       Link encap:Ethernet  HWaddr 00:14:5E:5B:5D:B6
          inet addr:9.125.52.14  Bcast:9.125.52.255  Mask:255.255.255.0
          inet6 addr: 2002:97b:c2c1:806:214:5eff:fe5b:5db6/64 Scope:Global
          inet6 addr: 2002:97b:c7ab:2009:214:5eff:fe5b:5db6/64 Scope:Global
          inet6 addr: 2002:97b:c7ab:2008:214:5eff:fe5b:5db6/64 Scope:Global
          inet6 addr: fe80::214:5eff:fe5b:5db6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15902296 errors:0 dropped:0 overruns:0 frame:0
          TX packets:766774 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2595335455 (2.4 GiB)  TX bytes:683299393 (651.6 MiB)

eth0      Link encap:Ethernet  HWaddr 00:14:5E:5B:5D:B6
          inet6 addr: fe80::214:5eff:fe5b:5db6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24167463 errors:0 dropped:10936037 overruns:0 frame:0
          TX packets:1167196 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4452886723 (4.1 GiB)  TX bytes:714691518 (681.5 MiB)
          Interrupt:90 Memory:e2000000-e2012800

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1432 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1432 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2525826 (2.4 MiB)  TX bytes:2525826 (2.4 MiB)

Comment 19 Juan Quintela 2009-12-17 11:52:50 UTC
    Bug is already fixed in:
    libvirt-0.6.3-24.el5 has been built in dist-5E-qu-candidate with the fixes
    that is in QE.  This is a duplicate of the bug 519204.

*** This bug has been marked as a duplicate of bug 519204 ***


Note You need to log in before you can comment on or make changes to this bug.