Bug 616352
Summary: | Lost first 9 packets when ping host using more than 2 NICs | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Amos Kong <akong> |
Component: | kernel | Assignee: | Justin M. Forbes <jforbes> |
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.0 | CC: | ailan, gcosta, herbert.xu, lcapitulino, llim, ndai, tburke, virt-maint |
Target Milestone: | rc | Keywords: | RHELNAK |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-05 14:20:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Amos Kong
2010-07-20 08:46:12 UTC
Setup bridge delay to 0, the bug also exists. host) # brctl setfd switch 0 This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Do you have spanning tree off for the bridge (you should)? Can you also test with vhost disabled? (In reply to comment #4) > Do you have spanning tree off for the bridge (you should)? Yes, stp was stopped. > Can you also test with vhost disabled? Bug can also be reproduced when disable vhost. It could not reproduce with rhel5.5 guest (vhost=on/off). A bug of rhel6 virtio driver ? What do you mean by "previous 10 packets"? Can you show the ping output please? Also, could you try to reduce the command-line options and the number of NICs to the minimal set that reproduces the problem? (In reply to comment #6) > What do you mean by "previous 10 packets"? Can you show the ping output please? The first 9 packages lost. BTW, change the subject to 'Lost first 9 packages when ping host using more than 2 NICs' > Also, could you try to reduce the command-line options and the number of NICs > to the minimal set that reproduces the problem? Can not reproduce with 2 nics. can reproduce with more than 2 nics. qemu command line: # qemu-kvm RHEL-Server-6.0-64-virtio.qcow2 -net nic,vlan=0,netdev=idMk6AAs,model=virtio,macaddr='02::7C:6C:9f:8f' -netdev tap,id=idMk6AAs,ifname='virtio_0_8000',script='/home/devel/push/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no',vhost=on -net nic,vlan=1,netdev=idLMFA6P,model=virtio,macaddr='02:A9:7C:6C:f9:ce' -netdev tap,id=idLMFA6P,ifname='virtio_1_8000',script='/home/devel/push/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no',vhost=on -net nic,vlan=2,netdev=idBs5261,model=e1000,macaddr='02:A9:7C:6C:d8:6d' -netdev tap,id=idBs5261,ifname='virtio_2_8000',script='/home/devel/push/client/tests/kvm/scripts/qemu-ifup',downscript='no',vhost=on -m 512 -vnc :0 -cpu qemu64,+sse2 -no-kvm-pit-reinjection host ip: 10.66.91.173 guest) # ping 10.66.91.173 -I eth2 -c 15 ping 10.66.91.173 -I eth2 -c 15 PING 10.66.91.173 (10.66.91.173) from 10.66.91.178 eth2: 56(84) bytes of data. 64 bytes from 10.66.91.173: icmp_seq=10 ttl=64 time=0.612 ms 64 bytes from 10.66.91.173: icmp_seq=11 ttl=64 time=0.127 ms 64 bytes from 10.66.91.173: icmp_seq=12 ttl=64 time=0.113 ms 64 bytes from 10.66.91.173: icmp_seq=13 ttl=64 time=0.109 ms 64 bytes from 10.66.91.173: icmp_seq=14 ttl=64 time=0.088 ms 64 bytes from 10.66.91.173: icmp_seq=15 ttl=64 time=0.116 ms --- 10.66.91.173 ping statistics --- 15 packets transmitted, 6 received, 60% packet loss, time 15007ms rtt min/avg/max/mdev = 0.088/0.194/0.612/0.187 ms guest) #ifconfig eth0 Link encap:Ethernet HWaddr 02:A9:7C:6C:9F:8F inet addr:10.66.91.162 Bcast:10.66.91.255 Mask:255.255.255.0 inet6 addr: fe80::a9:7cff:fe6c:9f8f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:849 errors:0 dropped:0 overruns:0 frame:0 TX packets:31 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:103950 (101.5 KiB) TX bytes:4844 (4.7 KiB) eth1 Link encap:Ethernet HWaddr 02:A9:7C:6C:F9:CE inet addr:10.66.91.196 Bcast:10.66.91.255 Mask:255.255.255.0 inet6 addr: fe80::a9:7cff:fe6c:f9ce/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:846 errors:0 dropped:0 overruns:0 frame:0 TX packets:25 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:103626 (101.1 KiB) TX bytes:4136 (4.0 KiB) eth2 Link encap:Ethernet HWaddr 02:A9:7C:6C:D8:6D inet addr:10.66.91.178 Bcast:10.66.91.255 Mask:255.255.255.0 inet6 addr: fe80::a9:7cff:fe6c:d86d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:411 errors:0 dropped:0 overruns:0 frame:0 TX packets:46 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:56914 (55.5 KiB) TX bytes:6454 (6.3 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Can I see the contents of your qemu-ifup scripts ? (In reply to comment #8) > Can I see the contents of your qemu-ifup scripts ? switch is a public bridge on host. # cat qemu-ifup-switch #!/bin/sh switch=switch /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif ${switch} $1 /usr/sbin/brctl setfd ${switch} 0 /usr/sbin/brctl stp ${switch} off [root@dhcp-91-173 network-scripts]# cat ifcfg-eth0 DEVICE="eth0" DEFROUTE="yes" HWADDR="00:23:AE:A9:7C:6C" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" NAME="System eth0" NM_CONTROLLED="yes" ONBOOT="yes" OPTIONS="layer2" PEERDNS="yes" PEERROUTES="yes" TYPE="Ethernet" UUID="5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03" BRIDGE=switch [root@dhcp-91-173 network-scripts]# cat ifcfg-br0 DEVICE=switch BOOTPROTO=dhcp ONBOOT=yes TYPE=Bridge Ok, I am still investigating the issue, but I will drop the info I have here, in case anybody is also looking at it. First of all, it is indeed an arp-related problem. If I pin the arp entries, problem go away. If I let 10 packets fire, then manually drop the arp entry, it stop working again. It is also worth noting that one of the interfaces, always seem to work. tcpdump in the guest, says this: 10:11:43.375275 ARP, Request who-has virtlab1.virt.bos.redhat.com tell dhcp75-13.virt.bos.redhat.com, length 28 10:11:43.375496 ARP, Reply virtlab1.virt.bos.redhat.com is-at 00:13:20:f5:fe:6b (oui Unknown), length 28 this is eth1, and it works. On eth0, that does NOT work, I see this: 10:11:43.375456 ARP, Request who-has virtlab1.virt.bos.redhat.com tell dhcp75-13.virt.bos.redhat.com, length 28 But not a reply. Guest tcpdump does not even show icmp requests. A little later on, I see another arp request, an actual reply, and then the icmp packets start showing up. I'll keep on it, further tips are appreciated. This is not a bug, just rp_filter doing its job. You should never connect four interfaces on the same machine (even if it's a virtual machine) to the same Ethernet, run DHCP and expect it to work. Only one interface will work, the one whose subnet route is added last. All the other ones will drop all inbound packets because of rp_filter. Even if you disabled rp_filter, things won't work consistently because of the fact that IPv4 addresses are per-host, so ARP replies will appear on random interfaces. To test this properly, you either need pin everything down on both sides with static ARP entries, or use separate Ethernet bridges for each guest interface. |