650097 – network in host become unavailable for few secs when guest quit

Bug 650097 - network in host become unavailable for few secs when guest quit

Summary: network in host become unavailable for few secs when guest quit

Keywords:
Status:	CLOSED DUPLICATE of bug 609463
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:	626678
Blocks:	Rhel5KvmTier2
TreeView+	depends on / blocked

Reported:	2010-11-05 08:05 UTC by Shirley Zhou
Modified:	2015-03-05 00:52 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	626678
Environment:
Last Closed:	2010-11-12 12:08:29 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tcpdump (10.92 KB, application/octet-stream) 2010-11-09 05:10 UTC, Shirley Zhou	no flags	Details
tcpdump_public (19.70 KB, application/octet-stream) 2010-11-11 04:23 UTC, Shirley Zhou	no flags	Details
View All

Description Shirley Zhou 2010-11-05 08:05:42 UTC

This issue also exist on rhel5.6 host.
kernel-2.6.18-229.el5.

+++ This bug was initially created as a clone of Bug #626678 +++

Description of problem:
Transfer big file from host to guest, then quit guest, then network in host will become unavailable for a few seconds as following:

ping from external box to host:
[root@dhcp-91-145 ~]# ping 10.66.91.53
PING 10.66.91.53 (10.66.91.53) 56(84) bytes of data.
64 bytes from 10.66.91.53: icmp_seq=1 ttl=64 time=1.26 ms
64 bytes from 10.66.91.53: icmp_seq=2 ttl=64 time=0.215 ms
64 bytes from 10.66.91.53: icmp_seq=3 ttl=64 time=0.188 ms
64 bytes from 10.66.91.53: icmp_seq=4 ttl=64 time=0.193 ms
64 bytes from 10.66.91.53: icmp_seq=5 ttl=64 time=0.198 ms
64 bytes from 10.66.91.53: icmp_seq=6 ttl=64 time=0.196 ms
64 bytes from 10.66.91.53: icmp_seq=7 ttl=64 time=0.204 ms
64 bytes from 10.66.91.53: icmp_seq=50 ttl=64 time=1.28 ms
64 bytes from 10.66.91.53: icmp_seq=51 ttl=64 time=0.166 ms
64 bytes from 10.66.91.53: icmp_seq=52 ttl=64 time=0.164 ms
64 bytes from 10.66.91.53: icmp_seq=53 ttl=64 time=0.167 ms
^C
--- 10.66.91.53 ping statistics ---
53 packets transmitted, 11 received, 79% packet loss, time 52692ms
rtt min/avg/max/mdev = 0.164/0.385/1.282/0.420 ms

ping from host to external box:
[root@dhcp-91-53 Desktop]# ping 10.66.91.127
PING 10.66.91.127 (10.66.91.127) 56(84) bytes of data.
64 bytes from 10.66.91.127: icmp_seq=1 ttl=64 time=0.182 ms
64 bytes from 10.66.91.127: icmp_seq=2 ttl=64 time=0.176 ms
64 bytes from 10.66.91.127: icmp_seq=3 ttl=64 time=0.177 ms
64 bytes from 10.66.91.127: icmp_seq=9 ttl=64 time=1.20 ms
64 bytes from 10.66.91.127: icmp_seq=10 ttl=64 time=0.145 ms
64 bytes from 10.66.91.127: icmp_seq=11 ttl=64 time=0.147 ms
64 bytes from 10.66.91.127: icmp_seq=12 ttl=64 time=0.157 ms
64 bytes from 10.66.91.127: icmp_seq=13 ttl=64 time=0.142 ms
64 bytes from 10.66.91.127: icmp_seq=14 ttl=64 time=0.165 ms
^C
--- 10.66.91.127 ping statistics ---
14 packets transmitted, 9 received, 35% packet loss, time 13851ms
rtt min/avg/max/mdev = 0.142/0.277/1.209/0.330 ms

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.112.el6.x86_64
kernel-2.6.32-66.el6.x86_64

How reproducible:
easy to reproduce when transfer big file from guest to host, or from host to guest.

Steps to Reproduce:
1.start rhel6 guest
/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name src -uuid 4667028b-49fa-aba8-a5a4-868a8212342f -nodefconfig -nodefaults -monitor stdio -boot c -drive file=/home/rhel6.img,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:12:a1:54,bus=pci.0,addr=0x3 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :12 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -qmp tcp:0:4446,server,nowait --no-kvm-pit-reinjection -rtc base=utc,clock=host,driftfix=slew
2.scp big file (4G) from host to above guest
3.after scp finished,quit guest through monitor

  
Actual results:
after step3, network in host become unavailable for few seconds,can not ping external, and external can not ping this host

Expected results:
Network in host should be smoothly. 

Additional info:

--- Additional comment from szhou on 2010-08-24 03:26:29 EDT ---

Additional info:

this issue happens when shutdown vm in guest.

--- Additional comment from mwagner on 2010-08-27 10:03:40 EDT ---

looks like you are running this on a "public" network. Please rerun on a private network where you can control all other activity on the network.

I would also disagree with the assumption that the network becomes unavailable.  In my lab a see a quick spike in latency but no packet loss. 
This is on a private network that we can control.

[root@perf10 ~]# ping 192.168.1.22
PING 192.168.1.22 (192.168.1.22) 56(84) bytes of data.
64 bytes from 192.168.1.22: icmp_seq=1 ttl=64 time=0.861 ms
64 bytes from 192.168.1.22: icmp_seq=2 ttl=64 time=0.053 ms
64 bytes from 192.168.1.22: icmp_seq=3 ttl=64 time=0.057 ms
64 bytes from 192.168.1.22: icmp_seq=4 ttl=64 time=0.053 ms
64 bytes from 192.168.1.22: icmp_seq=5 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=6 ttl=64 time=0.049 ms
64 bytes from 192.168.1.22: icmp_seq=7 ttl=64 time=0.048 ms
64 bytes from 192.168.1.22: icmp_seq=8 ttl=64 time=0.050 ms
64 bytes from 192.168.1.22: icmp_seq=9 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=10 ttl=64 time=0.046 ms
64 bytes from 192.168.1.22: icmp_seq=11 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=12 ttl=64 time=0.073 ms
64 bytes from 192.168.1.22: icmp_seq=13 ttl=64 time=0.062 ms
64 bytes from 192.168.1.22: icmp_seq=14 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=15 ttl=64 time=0.049 ms
64 bytes from 192.168.1.22: icmp_seq=16 ttl=64 time=0.043 ms
64 bytes from 192.168.1.22: icmp_seq=17 ttl=64 time=0.054 ms
64 bytes from 192.168.1.22: icmp_seq=18 ttl=64 time=0.038 ms
64 bytes from 192.168.1.22: icmp_seq=19 ttl=64 time=0.069 ms
64 bytes from 192.168.1.22: icmp_seq=20 ttl=64 time=0.067 ms
64 bytes from 192.168.1.22: icmp_seq=21 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=22 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=23 ttl=64 time=0.045 ms
64 bytes from 192.168.1.22: icmp_seq=24 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=25 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=26 ttl=64 time=0.048 ms
64 bytes from 192.168.1.22: icmp_seq=27 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=28 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=29 ttl=64 time=0.049 ms
64 bytes from 192.168.1.22: icmp_seq=30 ttl=64 time=0.046 ms
64 bytes from 192.168.1.22: icmp_seq=31 ttl=64 time=0.044 ms
64 bytes from 192.168.1.22: icmp_seq=32 ttl=64 time=0.046 ms
64 bytes from 192.168.1.22: icmp_seq=33 ttl=64 time=0.045 ms
64 bytes from 192.168.1.22: icmp_seq=34 ttl=64 time=0.046 ms
64 bytes from 192.168.1.22: icmp_seq=35 ttl=64 time=0.050 ms
64 bytes from 192.168.1.22: icmp_seq=36 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=37 ttl=64 time=0.049 ms
64 bytes from 192.168.1.22: icmp_seq=38 ttl=64 time=0.046 ms
64 bytes from 192.168.1.22: icmp_seq=39 ttl=64 time=0.045 ms
64 bytes from 192.168.1.22: icmp_seq=40 ttl=64 time=0.047 ms
64 bytes from 192.168.1.22: icmp_seq=41 ttl=64 time=0.049 ms
64 bytes from 192.168.1.22: icmp_seq=42 ttl=64 time=0.048 ms
64 bytes from 192.168.1.22: icmp_seq=43 ttl=64 time=0.046 ms

--- Additional comment from szhou on 2010-08-29 22:14:37 EDT ---

(In reply to comment #2)
I reproduce this bug on private network bridge virbr0 as following:

1. run two rhel6 vms , vm1 and vm2 on the same host with private network
2. scp a big file(4G) from host to vm1
3. ping vm2 from host
4. quit vm1

result of step3:
[root@dhcp-91-145 ~]# ping 192.168.122.59
PING 192.168.122.59 (192.168.122.59) 56(84) bytes of data.
64 bytes from 192.168.122.59: icmp_seq=1 ttl=64 time=0.532 ms
64 bytes from 192.168.122.59: icmp_seq=2 ttl=64 time=0.433 ms
64 bytes from 192.168.122.59: icmp_seq=3 ttl=64 time=0.434 ms
64 bytes from 192.168.122.59: icmp_seq=4 ttl=64 time=0.601 ms
64 bytes from 192.168.122.59: icmp_seq=5 ttl=64 time=0.451 ms
64 bytes from 192.168.122.59: icmp_seq=6 ttl=64 time=0.429 ms
64 bytes from 192.168.122.59: icmp_seq=7 ttl=64 time=0.428 ms
64 bytes from 192.168.122.59: icmp_seq=8 ttl=64 time=0.526 ms
64 bytes from 192.168.122.59: icmp_seq=33 ttl=64 time=0.399 ms
64 bytes from 192.168.122.59: icmp_seq=34 ttl=64 time=0.399 ms
64 bytes from 192.168.122.59: icmp_seq=35 ttl=64 time=0.448 ms
64 bytes from 192.168.122.59: icmp_seq=36 ttl=64 time=0.452 ms
64 bytes from 192.168.122.59: icmp_seq=37 ttl=64 time=0.296 ms
64 bytes from 192.168.122.59: icmp_seq=38 ttl=64 time=0.526 ms
64 bytes from 192.168.122.59: icmp_seq=39 ttl=64 time=0.348 ms

From above result, we can see about 25 packages lost when vm1 quit.

--- Additional comment from nhorman on 2010-10-14 11:07:21 EDT ---

can you please attach the hosts message log after you encounter this problem.  It would be interesting to see if a port change on the tap interface leads to some bridging issue.

My first guest is that you are running stp on vbir0, and the downing of veth0 on the bridge is seen as a topology change, leading to ports going back into learning mode or some such, during which time traffic is not forwarded.

--- Additional comment from szhou on 2010-10-14 22:21:38 EDT ---

(In reply to comment #4)
> can you please attach the hosts message log after you encounter this problem. 
> It would be interesting to see if a port change on the tap interface leads to
> some bridging issue.
> 
> My first guest is that you are running stp on vbir0, and the downing of veth0
> on the bridge is seen as a topology change, leading to ports going back into
> learning mode or some such, during which time traffic is not forwarded.

reproduce this bug as scenario in comment3.
[root@dhcp-91-53 home]# ping 192.168.122.99
PING 192.168.122.99 (192.168.122.99) 56(84) bytes of data.
64 bytes from 192.168.122.99: icmp_seq=1 ttl=64 time=1.25 ms
64 bytes from 192.168.122.99: icmp_seq=2 ttl=64 time=0.175 ms
64 bytes from 192.168.122.99: icmp_seq=3 ttl=64 time=0.158 ms
64 bytes from 192.168.122.99: icmp_seq=10 ttl=64 time=0.369 ms
64 bytes from 192.168.122.99: icmp_seq=11 ttl=64 time=0.174 ms
64 bytes from 192.168.122.99: icmp_seq=12 ttl=64 time=0.171 ms
64 bytes from 192.168.122.99: icmp_seq=13 ttl=64 time=0.188 ms
64 bytes from 192.168.122.99: icmp_seq=14 ttl=64 time=0.163 ms
64 bytes from 192.168.122.99: icmp_seq=15 ttl=64 time=0.165 ms
64 bytes from 192.168.122.99: icmp_seq=16 ttl=64 time=0.174 ms
64 bytes from 192.168.122.99: icmp_seq=17 ttl=64 time=0.192 ms
64 bytes from 192.168.122.99: icmp_seq=18 ttl=64 time=0.169 ms
64 bytes from 192.168.122.99: icmp_seq=19 ttl=64 time=0.140 ms
64 bytes from 192.168.122.99: icmp_seq=20 ttl=64 time=0.129 ms
64 bytes from 192.168.122.99: icmp_seq=21 ttl=64 time=0.180 ms
^C
--- 192.168.122.99 ping statistics ---
21 packets transmitted, 15 received, 28% packet loss, time 20108ms
rtt min/avg/max/mdev = 0.129/0.253/1.250/0.271 ms

dmesg info:
virbr0: port 1(tap0) entering disabled state
device tap0 left promiscuous mode
virbr0: port 1(tap0) entering disabled state
device tap0 entered promiscuous mode
virbr0: topology change detected, propagating
virbr0: port 1(tap0) entering forwarding state
kvm: 26516: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 26516: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 26516: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 26516: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079
kvm: 26516: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 26516: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 26516: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 26516: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x530079
tap0: no IPv6 routers present
device tap1 entered promiscuous mode
virbr0: topology change detected, propagating
virbr0: port 2(tap1) entering forwarding state
kvm: 26674: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 26674: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 26674: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 26674: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079
kvm: 26674: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 26674: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 26674: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 26674: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x530079
tap1: no IPv6 routers present
virbr0: port 1(tap0) entering disabled state
device tap0 left promiscuous mode
virbr0: port 1(tap0) entering disabled state
device tap0 entered promiscuous mode
virbr0: topology change detected, propagating
virbr0: port 1(tap0) entering forwarding state
kvm: 27093: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 27093: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 27093: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 27093: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079
kvm: 27093: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 27093: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 27093: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 27093: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x530079
tap0: no IPv6 routers present
virbr0: port 1(tap0) entering disabled state
device tap0 left promiscuous mode
virbr0: port 1(tap0) entering disabled state

--- Additional comment from nhorman on 2010-10-15 19:32:27 EDT ---

well it definately seems as though your bridge topology is changing alot during shutdown.  Can you use brctl to disable stp on the bridge in question?  That should solve the problem.

--- Additional comment from szhou on 2010-10-17 22:29:08 EDT ---

(In reply to comment #6)
> well it definately seems as though your bridge topology is changing alot during
> shutdown.  Can you use brctl to disable stp on the bridge in question?  That
> should solve the problem.

Hi, Neil

After disable stp on virbr0 bridge.
# brctl show
bridge name     bridge id               STP enabled     interfaces
breth0          8000.b8ac6f3b0c30       no              eth0
virbr0          8000.000000000000       no

Then do operations as comment4, this issue can be reproduce as following
# ping 192.168.122.99
PING 192.168.122.99 (192.168.122.99) 56(84) bytes of data.
64 bytes from 192.168.122.99: icmp_seq=1 ttl=64 time=0.178 ms
64 bytes from 192.168.122.99: icmp_seq=2 ttl=64 time=0.157 ms
64 bytes from 192.168.122.99: icmp_seq=3 ttl=64 time=0.148 ms
64 bytes from 192.168.122.99: icmp_seq=4 ttl=64 time=0.137 ms
64 bytes from 192.168.122.99: icmp_seq=5 ttl=64 time=0.139 ms
64 bytes from 192.168.122.99: icmp_seq=6 ttl=64 time=0.115 ms
^C
--- 192.168.122.99 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5054ms
rtt min/avg/max/mdev = 0.115/0.145/0.178/0.023 ms

And attach dmesg info as following:
[root@dhcp-91-53 home]# dmesg
virbr0: port 1(tap0) entering disabled state
device tap0 left promiscuous mode
virbr0: port 1(tap0) entering disabled state
device tap0 entered promiscuous mode
virbr0: port 1(tap0) entering forwarding state
__ratelimit: 6 callbacks suppressed
kvm: 32702: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 32702: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 32702: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 32702: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079
kvm: 32702: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0x0
kvm: 32702: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 32702: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0xffd76974
kvm: 32702: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x530079
tap0: no IPv6 routers present
virbr0: port 1(tap0) entering disabled state
device tap0 left promiscuous mode
virbr0: port 1(tap0) entering disabled state

Any suggestion? Thanks.

--- Additional comment from szhou on 2010-11-04 03:14:22 EDT ---

Reproduce this issue on kernel-2.6.32-71.8.1.el6.x86_64.

After set stp=off for virbr0
# brctl show
bridge name	bridge id		STP enabled	interfaces
breth0		8000.0023ae8dca1c	no		eth0
virbr0		8000.b2ab3616f407	no		tap1

then did operation as comment 4, this bug can reproduce as:
# ping 192.168.122.176
PING 192.168.122.176 (192.168.122.176) 56(84) bytes of data.
64 bytes from 192.168.122.176: icmp_seq=1 ttl=64 time=0.223 ms
64 bytes from 192.168.122.176: icmp_seq=2 ttl=64 time=0.187 ms
64 bytes from 192.168.122.176: icmp_seq=3 ttl=64 time=0.214 ms
64 bytes from 192.168.122.176: icmp_seq=4 ttl=64 time=0.177 ms
64 bytes from 192.168.122.176: icmp_seq=5 ttl=64 time=0.194 ms
64 bytes from 192.168.122.176: icmp_seq=6 ttl=64 time=0.125 ms
64 bytes from 192.168.122.176: icmp_seq=39 ttl=64 time=0.420 ms
64 bytes from 192.168.122.176: icmp_seq=40 ttl=64 time=0.112 ms
64 bytes from 192.168.122.176: icmp_seq=41 ttl=64 time=0.170 ms
^C
--- 192.168.122.176 ping statistics ---
41 packets transmitted, 9 received, 78% packet loss, time 40746ms
rtt min/avg/max/mdev = 0.112/0.202/0.420/0.085 ms

--- Additional comment from szhou on 2010-11-05 00:50:57 EDT ---

It is easy to reproduce this bug on host with e1000e nic, like Optiplex 780, Optiplex 760.

00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02)
	Subsystem: Dell Device 0276
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 106
	Region 0: Memory at fdfe0000 (32-bit, non-prefetchable) [size=128K]
	Region 1: Memory at fdfd9000 (32-bit, non-prefetchable) [size=4K]
	Region 2: I/O ports at ece0 [size=32]
	Capabilities: [c8] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee03000  Data: 406a
	Capabilities: [e0] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: e1000e
	Kernel modules: e1000e


And this bug also happens on RHEL5.6 host.
kernel-2.6.18-229.el5

Comment 1 Shirley Zhou 2010-11-09 05:09:41 UTC

As Herbert suggest, I take packet dump on the host.

To escape public network interruption, I run guest on private bridge virbr0.
# brctl show
bridge name     bridge id               STP enabled     interfaces
breth0          8000.b8ac6f3b09be       no              eth0
virbr0          8000.626e8c676ff8       no              tap1
                                                        tap0
Reproduce this issue as https://bugzilla.redhat.com/show_bug.cgi?id=626678#c3.

# ping 192.168.122.43
PING 192.168.122.43 (192.168.122.43) 56(84) bytes of data.
64 bytes from 192.168.122.43: icmp_seq=1 ttl=64 time=0.176 ms
64 bytes from 192.168.122.43: icmp_seq=2 ttl=64 time=0.138 ms
64 bytes from 192.168.122.43: icmp_seq=3 ttl=64 time=0.137 ms
64 bytes from 192.168.122.43: icmp_seq=4 ttl=64 time=0.147 ms
64 bytes from 192.168.122.43: icmp_seq=5 ttl=64 time=0.160 ms
64 bytes from 192.168.122.43: icmp_seq=6 ttl=64 time=0.120 ms
64 bytes from 192.168.122.43: icmp_seq=43 ttl=64 time=1.66 ms
64 bytes from 192.168.122.43: icmp_seq=44 ttl=64 time=0.115 ms
64 bytes from 192.168.122.43: icmp_seq=45 ttl=64 time=0.108 ms
64 bytes from 192.168.122.43: icmp_seq=46 ttl=64 time=0.120 ms
64 bytes from 192.168.122.43: icmp_seq=47 ttl=64 time=0.121 ms
64 bytes from 192.168.122.43: icmp_seq=48 ttl=64 time=0.119 ms
64 bytes from 192.168.122.43: icmp_seq=49 ttl=64 time=0.140 ms
64 bytes from 192.168.122.43: icmp_seq=50 ttl=64 time=0.162 ms
64 bytes from 192.168.122.43: icmp_seq=51 ttl=64 time=0.156 ms

--- 192.168.122.43 ping statistics ---
51 packets transmitted, 15 received, 70% packet loss, time 50008ms

Capture package using tcpdump -i virbr0 -w tcpdump.dump, please see attachment for reference.

Comment 2 Shirley Zhou 2010-11-09 05:10:41 UTC

Created attachment 458974 [details]
tcpdump

Comment 3 Herbert Xu 2010-11-10 09:30:30 UTC

Can you show me the output of ifconfig before you shut down the guest and reproduced the problem? Thanks!

Comment 5 Herbert Xu 2010-11-10 11:16:40 UTC

OK, the problem with this is pretty clear, your virtbr0 MAC address changed after shutting down the guest (because virtbr0 doesn't have a stable MAC address of its own).  So if you assign a stable MAC address to virtbr0 either by hand or through an interface that's always there, then it should work correctly.

However, this doesn't explain why you saw an outage with from external.  Can you please repeat the test from external and take a packet dump on the external Ethernet interface?

Thanks!

Comment 6 Neil Horman 2010-11-10 14:22:11 UTC

*** Bug 626678 has been marked as a duplicate of this bug. ***

Comment 7 Shirley Zhou 2010-11-11 04:23:05 UTC

(In reply to comment #5)
> OK, the problem with this is pretty clear, your virtbr0 MAC address changed
> after shutting down the guest (because virtbr0 doesn't have a stable MAC
> address of its own).  So if you assign a stable MAC address to virtbr0 either
> by hand or through an interface that's always there, then it should work
> correctly.
> 
> However, this doesn't explain why you saw an outage with from external.  Can
> you please repeat the test from external and take a packet dump on the external
> Ethernet interface?
> 
> Thanks!

Hi, Herbert 

Thanks a lot for your comment.

I take package dump from external host to host running guest, reproduce this bug, you can get package dump from attached dump file.

[root@dhcp-91-53 ~]# ping 10.66.91.145
PING 10.66.91.145 (10.66.91.145) 56(84) bytes of data.
64 bytes from 10.66.91.145: icmp_seq=1 ttl=64 time=0.191 ms
64 bytes from 10.66.91.145: icmp_seq=2 ttl=64 time=0.173 ms
64 bytes from 10.66.91.145: icmp_seq=3 ttl=64 time=0.185 ms
64 bytes from 10.66.91.145: icmp_seq=4 ttl=64 time=0.164 ms
64 bytes from 10.66.91.145: icmp_seq=5 ttl=64 time=0.170 ms
64 bytes from 10.66.91.145: icmp_seq=6 ttl=64 time=0.173 ms
64 bytes from 10.66.91.145: icmp_seq=39 ttl=64 time=1.28 ms
64 bytes from 10.66.91.145: icmp_seq=40 ttl=64 time=0.151 ms
64 bytes from 10.66.91.145: icmp_seq=41 ttl=64 time=0.159 ms
64 bytes from 10.66.91.145: icmp_seq=42 ttl=64 time=0.164 ms
64 bytes from 10.66.91.145: icmp_seq=43 ttl=64 time=0.167 ms
64 bytes from 10.66.91.145: icmp_seq=44 ttl=64 time=0.149 ms
64 bytes from 10.66.91.145: icmp_seq=45 ttl=64 time=0.183 ms
64 bytes from 10.66.91.145: icmp_seq=46 ttl=64 time=0.163 ms
64 bytes from 10.66.91.145: icmp_seq=47 ttl=64 time=0.172 ms
64 bytes from 10.66.91.145: icmp_seq=48 ttl=64 time=0.177 ms
64 bytes from 10.66.91.145: icmp_seq=49 ttl=64 time=0.150 ms
64 bytes from 10.66.91.145: icmp_seq=50 ttl=64 time=0.141 ms
64 bytes from 10.66.91.145: icmp_seq=51 ttl=64 time=0.148 ms
64 bytes from 10.66.91.145: icmp_seq=52 ttl=64 time=0.158 ms

Comment 8 Shirley Zhou 2010-11-11 04:23:50 UTC

Created attachment 459622 [details]
tcpdump_public

Comment 9 Neil Horman 2010-11-11 12:00:55 UTC

I'm looking at this tcpdump, and I see no interruption of service, in the icmp traffic or any other ip traffic between the .53 address or the .145 address.  When you say you reproduced this bug, do you mean that you followed the test steps you previously outlined, or did you note the outage in some other way?

Comment 10 Neil Horman 2010-11-11 12:02:50 UTC

wait, scratch that, I see the interruption now, its too early  for me here :(

Comment 11 Neil Horman 2010-11-11 12:23:50 UTC

although it looks from the trace like you didn't assign a fixed mac address to the virbr0 interface, or that despite setting it, it changed anyway.  The source mac from 10.66.91.145 changes from frame 31 to frame 150 from 62:5f:c6:19:df:a9 (which appears locally assigned) to b8:ac:6f:3b:0c:30 (which has a Dell OUI, and is the physical MAC of eth0).  Its almost as if the bridge is completely dissappearing when the guest exits, and traffic starts getting sent through the physical interface, rather than getting proxied through the bridge.  That certainly seems to be the case 

In comment 4 you indicated what ifconfig output looked like before the guest shut down.  What doe ifconfig and ip route show output after the guest shuts down.

Comment 14 Neil Horman 2010-11-18 14:37:52 UTC

Triage assignment.  If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment

Note You need to log in before you can comment on or make changes to this bug.