Bug 845287 - packet loss during parallel flood ping virtio NIC for rhel5.9 64 bit guest
Summary: packet loss during parallel flood ping virtio NIC for rhel5.9 64 bit guest
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.9
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Michael S. Tsirkin
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 928283
TreeView+ depends on / blocked
 
Reported: 2012-08-02 14:53 UTC by Sibiao Luo
Modified: 2013-03-28 09:01 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-27 13:29:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
parallel_flood_ping_results (7.39 KB, text/plain)
2012-08-02 14:53 UTC, Sibiao Luo
no flags Details
two other kernel packages test results. (5.87 KB, text/plain)
2012-08-03 11:29 UTC, Sibiao Luo
no flags Details
test_results_kernel-2.6.18-328&325.el5 (5.91 KB, text/plain)
2012-08-08 02:04 UTC, Sibiao Luo
no flags Details
test_results_for_kernel-2.6.18-332&333.el5 (54.95 KB, text/plain)
2012-08-09 07:49 UTC, Sibiao Luo
no flags Details
virtio_e1000_results (8.15 KB, text/plain)
2012-08-13 02:43 UTC, Sibiao Luo
no flags Details

Description Sibiao Luo 2012-08-02 14:53:17 UTC
Created attachment 601976 [details]
parallel_flood_ping_results

Description of problem:
    Boot a rhel5.9 64bit guest on rhel5.9 64bit host with three different modles of virtual nics: RTL8139,virtio,e1000. Adjust the arp policy: use the dedicated interface hw addr to announce and response the arp packet, and set the max MTU for each nics, then do parallel flood ping each nics in the host. Many package loss occored for virtio NIC, but the package loss for e1000 and RTL8139 NIC were zero. 

    BTW, I also test the rhel6.3 64bit guest on the rhel5.9 64bit host with the same steps, the package loss for virtio, e1000 and RTL8139 NICs were zero during parallel flood ping, all the packages can be transmitted and received successfully.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q kvm
2.6.18-333.el5
kvm-83-258.el5
guest info:
# uname -r
2.6.18-333.el5

How reproducible:
100%

Steps to Reproduce:
1.boot the virtual machine with three different modles of virtual nics: RTL8139,virtio,e1000.
eg: # /usr/libexec/qemu-kvm -m 2G -smp 2,cores=2,threads=1,sockets=1 -M rhel5.6.0 -cpu qemu64,+sse2 -name virtio_nic_test -drive file=/home/RHEL-Server-5.9-64-virtio.qcow2,format=qcow2,media=disk,if=virtio,cache=none,werror=stop,boot=on -uuid `uuidgen` -balloon virtio -monitor unix:/tmp/virt-nic-sluo,server,nowait -spice port=5931,disable-ticketing -qxl 1 -usbdevice tablet -soundhw ac97 -no-hpet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -boot c -net nic,vlan=0,model=virtio,macaddr=08:2E:5F:0A:0D:0A -net tap,sndbuf=0,vlan=0,script=/etc/qemu-ifup,downscript=no -net nic,vlan=1,model=rtl8139,macaddr=08:2E:5F:0A:0D:1A -net tap,sndbuf=0,vlan=1,script=/etc/qemu-ifup,downscript=no -net nic,vlan=2,model=e1000,macaddr=08:2E:5F:0A:0D:2A -net tap,sndbuf=0,vlan=2,script=/etc/qemu-ifup,downscript=no
2.adjust the arp policy: use the dedicated interface hw addr to announce and response the arp packet.
# echo 2 > /proc/sys/net/ipv4/conf/default/arp_ignore
# echo 2 > /proc/sys/net/ipv4/conf/default/arp_announce
3.check the NIC type in guest.
# ethtool -i eth0
driver: 8139cp
version: 1.2
firmware-version: 
bus-info: 0000:00:05.0
# ethtool -i eth1
driver: virtio_net
version: 
firmware-version: 
bus-info: virtio0
# ethtool -i eth2
driver: e1000
version: 7.3.21-k4-3-NAPI
firmware-version: N/A
bus-info: 0000:00:06.0
4.set the max MTU for each nics.
# ifconfig eth0 mtu 1500 ( for rtl8139 )
# ifconfig eth1 mtu 65535 ( for virtio )
# ifconfig eth2 mtu 16110 ( for e1000 )
5.get the network ip address in guest via ifconfig.
eth0 addr:10.66.11.83 
eth1 addr:10.66.11.65 
eth2 addr:10.66.11.66 
6.in the host, parallel flood ping each nics through the following commands.
ping -f eth0 -s {size from 0 to 1500}
ping -f eth1 -s {size form 0 to 65507}
ping -f eth2 -s {size from 0 to 16110}
eg:
RTL8139:
--------
# for s in 0 1 48 64 512 1440 1500; do ping -f -c 50 10.66.11.83 -s $s; done &
e1000:
-------
# for s in 0 1 48 64 512 1440 1500 1505 4096 4192 16110; do ping -f -c 50 10.66.11.66 -s $s; done &
virtio:
-------
# for s in 0 1 48 64 512 1440 1500 1505 4096 4192 32767 65507; do ping -f -c 50 10.66.11.65 -s $s; done &
  
Actual results:
after the step 6, many package loss occored for virtio NIC,
...
--- 10.66.11.65 ping statistics ---
50 packets transmitted, 49 received, 2% packet loss, time 32ms
rtt min/avg/max/mdev = 0.202/0.238/0.371/0.047 ms, ipg/ewma 0.663/0.241 ms
PING 10.66.11.65 (10.66.11.65) 4192(4220) bytes of data.
. 
--- 10.66.11.65 ping statistics ---
50 packets transmitted, 49 received, 2% packet loss, time 32ms
rtt min/avg/max/mdev = 0.202/0.236/0.389/0.047 ms, ipg/ewma 0.661/0.245 ms
PING 10.66.11.65 (10.66.11.65) 32767(32795) bytes of data.
.............................
--- 10.66.11.65 ping statistics ---
50 packets transmitted, 21 received, 58% packet loss, time 510ms
rtt min/avg/max/mdev = 1.522/13.531/51.712/16.436 ms, pipe 4, ipg/ewma 10.410/23.953 ms
PING 10.66.11.65 (10.66.11.65) 65507(65535) bytes of data.
.................................................
--- 10.66.11.65 ping statistics ---
50 packets transmitted, 1 received, 98% packet loss, time 573ms
rtt min/avg/max/mdev = 23.024/23.024/23.024/0.000 ms, pipe 3, ipg/ewma 11.695/23.024 ms
...

Expected results:
the package loss should be zero for virtio NIC during parallel flood ping, all the packages can be transmitted and received successfully.

Additional info:
The detail test results will be attached later(parallel_flood_ping_results.txt).

Comment 1 RHEL Program Management 2012-08-02 15:09:06 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 2 Sibiao Luo 2012-08-03 09:32:41 UTC
(In reply to comment #0)
> Created attachment 601976 [details]
> parallel_flood_ping_results
> 
> Description of problem:
>     Boot a rhel5.9 64bit guest on rhel5.9 64bit host with three different
> modles of virtual nics: RTL8139,virtio,e1000. Adjust the arp policy: use the
> dedicated interface hw addr to announce and response the arp packet, and set
> the max MTU for each nics, then do parallel flood ping each nics in the
> host. Many package loss occored for virtio NIC, but the package loss for
> e1000 and RTL8139 NIC were zero. 
> 
>     BTW, I also test the rhel6.3 64bit guest on the rhel5.9 64bit host with
> the same steps, the package loss for virtio, e1000 and RTL8139 NICs were
> zero during parallel flood ping, all the packages can be transmitted and
> received successfully.
> 
hi all,
  
   I also test this issue in rhel5.8 64bit guest, it did not hit this issue, the package loss for virtio, e1000 and RTL8139 NICs were zero during parallel flood ping, all the packages can be transmitted and received successfully. 

   So, this issue may be a regression bug for rhel5.9.

Best wish.
sluo

Comment 3 Amit Shah 2012-08-03 10:30:14 UTC
Please check with msi disabled to confirm if this happens due to the msi patches.

Comment 4 Sibiao Luo 2012-08-03 10:53:53 UTC
(In reply to comment #3)
> Please check with msi disabled to confirm if this happens due to the msi
> patches.

Hi amit,

   In my virtio_nic_device run testing, we add the "pci=nomsi" in guest kernel command line, I double test it as your indication, it still hit this issue, the result as following.

For virtio NIC:
---------------
# for s in 0 1 48 64 512 1440 1500 1505 4096 4192 32767 65507; do ping -f -c 100 10.66.11.65 -s $s; done &
[1] 3296
[root@localhost ~]# PING 10.66.11.65 (10.66.11.65) 0(28) bytes of data.
 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 10ms
, ipg/ewma 0.104/0.000 ms
PING 10.66.11.65 (10.66.11.65) 1(29) bytes of data.
 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 9ms
, ipg/ewma 0.096/0.000 ms
PING 10.66.11.65 (10.66.11.65) 48(76) bytes of data.
 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 9ms
rtt min/avg/max/mdev = 0.038/0.086/0.181/0.023 ms, ipg/ewma 0.100/0.085 ms
PING 10.66.11.65 (10.66.11.65) 64(92) bytes of data.
 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 9ms
rtt min/avg/max/mdev = 0.038/0.083/0.168/0.026 ms, ipg/ewma 0.098/0.086 ms
PING 10.66.11.65 (10.66.11.65) 512(540) bytes of data.
 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 0.028/0.070/0.217/0.031 ms, ipg/ewma 0.083/0.079 ms
PING 10.66.11.65 (10.66.11.65) 1440(1468) bytes of data.
 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 7ms
rtt min/avg/max/mdev = 0.032/0.064/0.165/0.030 ms, ipg/ewma 0.078/0.062 ms
PING 10.66.11.65 (10.66.11.65) 1500(1528) bytes of data.
. 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 99 received, 1% packet loss, time 38ms
rtt min/avg/max/mdev = 0.113/0.169/0.305/0.035 ms, ipg/ewma 0.387/0.177 ms
PING 10.66.11.65 (10.66.11.65) 1505(1533) bytes of data.
. 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 99 received, 1% packet loss, time 39ms
rtt min/avg/max/mdev = 0.144/0.186/0.275/0.027 ms, ipg/ewma 0.404/0.192 ms
PING 10.66.11.65 (10.66.11.65) 4096(4124) bytes of data.
... 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 97 received, 3% packet loss, time 55ms
rtt min/avg/max/mdev = 0.202/0.237/0.340/0.030 ms, ipg/ewma 0.564/0.234 ms
PING 10.66.11.65 (10.66.11.65) 4192(4220) bytes of data.
... 
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 97 received, 3% packet loss, time 66ms
rtt min/avg/max/mdev = 0.199/0.236/0.357/0.038 ms, ipg/ewma 0.670/0.242 ms
PING 10.66.11.65 (10.66.11.65) 32767(32795) bytes of data.
........................................................
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 44 received, 56% packet loss, time 1084ms
rtt min/avg/max/mdev = 0.806/13.705/51.574/16.825 ms, pipe 4, ipg/ewma 10.950/32.871 ms
PING 10.66.11.65 (10.66.11.65) 65507(65535) bytes of data.
..................................................................................................
[root@localhost ~]# .
--- 10.66.11.65 ping statistics ---
100 packets transmitted, 1 received, 99% packet loss, time 1412ms
rtt min/avg/max/mdev = 33.114/33.114/33.114/0.000 ms, pipe 4, ipg/ewma 14.272/33.114 ms

[1]+  Done                    for s in 0 1 48 64 512 1440 1500 1505 4096 4192 32767 65507;
do
    ping -f -c 100 10.66.11.65 -s $s;
done

Comment 5 Sibiao Luo 2012-08-03 11:29:47 UTC
Created attachment 602115 [details]
two other kernel packages test results.

Comment 6 Sibiao Luo 2012-08-03 11:30:46 UTC
(In reply to comment #3)
> Please check with msi disabled to confirm if this happens due to the msi
> patches.

I also test two other kernel packages, use default kernel line in both tests, not modify anything, the detail results were attached above (attachment #602115 [details]).

--------------------------test 1:
geust info:
kernel-2.6.18-308.11.1.el5
https://brewweb.devel.redhat.com/buildinfo?buildID=218060

test results:
the package loss for virtio NICs was zero during parallel flood ping, all the packages can be transmitted and received successfully.

--------------------------test 2:
geust info:
kernel-2.6.18-308.11.1.el5.bz813995.msi
https://brewweb.devel.redhat.com/taskinfo?taskID=4710546

test results:
the package loss for virtio NICs was zero during parallel flood ping, all the packages can be transmitted and received successfully.

Comment 7 Amit Shah 2012-08-03 11:36:06 UTC
Thank you very much for testing those kernels.

The kernels mentioned in the previous comment are the 5.8 z-stream kernel and the same kernel, but with msi patches backported from 5.9.

As the kernel-2.6.18-308.11.1.el5.bz813995.msi kernel didn't have any packet drops, this is not a regression due to the MSI patches.

Comment 9 juzhang 2012-08-06 02:28:30 UTC
make a summary.

According to comment0.
This issue in only happens virtio nic.

According to comment2, the reporter can not reproduce on rhel5.8 guest, mark this issue as regression.

According to comment4,6&7, not a regression due to the MSI patches.

Comment 13 Sibiao Luo 2012-08-08 02:04:15 UTC
Created attachment 602903 [details]
test_results_kernel-2.6.18-328&325.el5

Comment 14 Sibiao Luo 2012-08-08 02:05:50 UTC
(In reply to comment #11)
> There's some changes in virtio-net driver recently, please check whether you
> can reporduce this issue in the following version:
> 
> - 2.6.18-328.el5
> - 2.6.18-325.el5
> 
Hi jason,
  
   I test the kernel-2.6.18-328.el5 and kernel-2.6.18-325.el5 as your instruction with the same steps as  Comment #0,both of them did not hit this issue, the package loss for virtio NICs was zero during parallel flood ping, all the packages can be transmitted and received successfully.

   The test results was attached (attachment #602903 [details]).

Best wish.
sluo

Comment 15 RHEL Program Management 2012-08-08 07:08:43 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 17 Sibiao Luo 2012-08-09 07:49:19 UTC
Created attachment 603197 [details]
test_results_for_kernel-2.6.18-332&333.el5

Comment 18 Sibiao Luo 2012-08-09 07:50:33 UTC
(In reply to comment #16)
> (In reply to comment #14)
> > (In reply to comment #11)
> > > There's some changes in virtio-net driver recently, please check whether you
> 
> Thanks for the testing, the next step is to do a bisect between 328 and 333
> to find the earliest that can reproduce this issue. Thanks.

hi jason,

    I tried to reproduce this issue as your indication. the test result is that only the kernel-2.6.18-333.el5 can hit this issue. I test the kernel-2.6.18-333.el5 and kernel-2.6.18-332.el5, the result will be attached (attachment #603197 [details]).

Best Regards.
sluo

Comment 19 jason wang 2012-08-09 08:03:49 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > (In reply to comment #14)
> > > (In reply to comment #11)
> > > > There's some changes in virtio-net driver recently, please check whether you
> > 
> > Thanks for the testing, the next step is to do a bisect between 328 and 333
> > to find the earliest that can reproduce this issue. Thanks.
> 
> hi jason,
> 
>     I tried to reproduce this issue as your indication. the test result is
> that only the kernel-2.6.18-333.el5 can hit this issue. I test the
> kernel-2.6.18-333.el5 and kernel-2.6.18-332.el5, the result will be attached
> (attachment #603197 [details]).
> 
> Best Regards.
> sluo

According to your test result, looks like you are using a public network. Please test with two direct connected machines or even the localhost to guest to make sure the switch does not doing anyhting evil.

Thanks

Comment 20 Sibiao Luo 2012-08-09 08:35:55 UTC
(In reply to comment #19)
> 
> According to your test result, looks like you are using a public network.
> Please test with two direct connected machines or even the localhost to
> guest to make sure the switch does not doing anyhting evil.
> 
yes, i tried it as your indication in private network with kernel-2.6.18-333 & 332.el5, only the kernel-2.6.18-333.el5 version can hit this issue. during my test, append the 'pci=nomsi' to both of the guest kernel line, my host kernel info is kernel-2.6.18-333.

the kernel-2.6.18-333.el5 result:
--------------------------cut here--------------------------
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 193 received, 3% packet loss, time 641ms
rtt min/avg/max/mdev = 1.125/3.376/50.041/5.185 ms, pipe 5, ipg/ewma 3.225/3.719 ms
..PING 192.168.122.17 (192.168.122.17) 65507(65535) bytes of data.
................    
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 197 received, 1% packet loss, time 808ms
rtt min/avg/max/mdev = 0.872/3.831/11.068/1.070 ms, pipe 2, ipg/ewma 4.061/3.128 ms
.......
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 196 received, 2% packet loss, time 836ms
rtt min/avg/max/mdev = 0.737/4.553/51.246/5.448 ms, pipe 5, ipg/ewma 4.201/4.397 ms
.... 
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 196 received, 2% packet loss, time 854ms
rtt min/avg/max/mdev = 1.313/4.582/49.997/5.279 ms, pipe 5, ipg/ewma 4.292/4.174 ms
...
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 194 received, 3% packet loss, time 753ms
rtt min/avg/max/mdev = 0.437/3.422/4.811/1.411 ms, ipg/ewma 3.786/0.933 ms
.
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 194 received, 3% packet loss, time 866ms
rtt min/avg/max/mdev = 0.964/4.535/51.855/5.124 ms, pipe 5, ipg/ewma 4.354/3.931 ms
 
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 196 received, 2% packet loss, time 844ms
rtt min/avg/max/mdev = 0.865/4.383/44.696/4.254 ms, pipe 4, ipg/ewma 4.245/3.313 ms
. 
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 196 received, 2% packet loss, time 860ms
rtt min/avg/max/mdev = 1.002/4.659/50.062/5.301 ms, pipe 5, ipg/ewma 4.325/3.616 ms
.
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 193 received, 3% packet loss, time 739ms
rtt min/avg/max/mdev = 0.492/3.335/4.813/1.496 ms, ipg/ewma 3.716/0.820 ms
 
--- 192.168.122.17 ping statistics ---
200 packets transmitted, 191 received, 4% packet loss, time 687ms
rtt min/avg/max/mdev = 0.471/2.980/4.797/1.692 ms, ipg/ewma 3.454/0.754 ms

--- 192.168.122.17 ping statistics ---
200 packets transmitted, 196 received, 2% packet loss, time 609ms
rtt min/avg/max/mdev = 0.461/2.774/4.817/1.740 ms, ipg/ewma 3.064/0.713 ms

Best wish.
sluo

Comment 21 Sibiao Luo 2012-08-09 08:45:00 UTC
make a summary according to the comment #18 and comment #20.

1).according the comment #18, only the kernel-2.6.18-333.el5 version can hit this issue in public network.

2).according the comment #20, only the kernel-2.6.18-333.el5 version can hit this issue in private network.

info: append the 'pci=nomsi' to the guest kernel line during my test, my host kernel version is kernel-2.6.18-333.

Comment 22 jason wang 2012-08-09 09:27:37 UTC
(In reply to comment #21)
> make a summary according to the comment #18 and comment #20.
> 
> 1).according the comment #18, only the kernel-2.6.18-333.el5 version can hit
> this issue in public network.
> 
> 2).according the comment #20, only the kernel-2.6.18-333.el5 version can hit
> this issue in private network.
> 
> info: append the 'pci=nomsi' to the guest kernel line during my test, my
> host kernel version is kernel-2.6.18-333.

Have a glance at the git log between 332 and 333, there's no changes to neither virtio-net nor net-core changes. The only changes is related to ip over ib.

Another thing is your test script:

for i in {1..10}
do
  for s in {0 1 48 64 512 1440 1500 1505 4096 4192 32767 65507}; do ping -f -c 200 10.66.9.3 -s $s; done &
done

This may provide unexpected result since:

1. Only 200 packets (-c 200) were sent during the flood ping which is expected to be sent in a very short time, the ping may be not executed in parallel as you expected.
2. Looks like you want to start 2*12 = 24 parallel processes to do the flood ping with different packet sizes, it depends on the scheduler to choose which process to run first. So the workload produced by this script it not stable and varies form time to time.
3. need a 'wait' in the script to wait for the completion of sub ping process, otherwise if you run two runs in a very short gap, the environment may be polluted by the previous run.

So please find a correct way to do produce a stable network loads, then we can compare the result between two versions.

btw. There's a flood ping test in autotest, why not just use it?

Thanks

Comment 23 Sibiao Luo 2012-08-10 09:29:48 UTC
(In reply to comment #7)
>
(In reply to comment #22)
>
Hi jason & amit,

   I have retested it as jason's indication in private network. I run each test at least 5 times to see the result during my test which use the same configuration & workload, i produce the same packets per second(pps) with the same package size in order to do better regression test.

host info:
# uname -r && rpm -q kvm
2.6.18-333.el5
kvm-83-258.el5

Steps:
1.boot the virtual machine with virtio NIC in private network.
eg: # /usr/libexec/qemu-kvm -m 2G -smp 2,cores=2,threads=1,sockets=1 -M rhel5.6.0 -cpu qemu64,+sse2 -name virtio_nic_test -drive file=/home/RHEL-Server-5.9-64-virtio.qcow2,format=qcow2,media=disk,if=virtio,cache=none,werror=stop,boot=on -uuid `uuidgen` -balloon virtio -monitor unix:/tmp/virt-nic-sluo,server,nowait -spice port=5931,disable-ticketing -qxl 1 -usbdevice tablet -soundhw ac97 -no-hpet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -boot c -net nic,vlan=0,model=virtio,macaddr=08:2E:5F:0A:0D:A0 -net tap,sndbuf=0,vlan=0,script=/etc/qemu-ifup-private,downscript=no
2.adjust the arp policy: use the dedicated interface hw addr to announce and response the arp packet.
# echo 2 > /proc/sys/net/ipv4/conf/default/arp_ignore
# echo 2 > /proc/sys/net/ipv4/conf/default/arp_announce
3.check the NIC type in local guest.
# ethtool -i eth0
4.set the max MTU for eth0.
# ifconfig eth0 mtu 65535 (65535 for virtio, 16110 for e1000, 1500 for rtl8139)
5.run netserver in local guest.
6.run netperf in local host with the same package size.
# while true; do netperf -t UDP_STREAM -f m -H 192.168.122.17 -P 0 -l 10 -- -m 65535; done
7.parallel flood ping NIC in local host with the script.
# cat flood_ping.sh 
ping -f -c 1000 192.168.122.17 -s 65535 &
ping -f -c 1000 192.168.122.17 -s 65535 &
ping -f -c 1000 192.168.122.17 -s 65535 &

Test results:
I have test the kernel-2.6.18-308.el5(rhel5.8GA) & kernel-2.6.18-323.el5 & kernel-2.6.18-327.el5 & kernel-2.6.18-328.el5 & kernel-2.6.18-331.el5 & kernel-2.6.18-332.el5 & kernel-2.6.18-333.el5 for guest. after the step 7, all of them have many package loss for virtio NIC during my tests(at least 5 times to see the result), all the test results like,
# sh flood_ping.sh 
PING 192.168.122.17 (192.168.122.17) 65507(65535) bytes of data.
.PING 192.168.122.17 (192.168.122.17) 65507(65535) bytes of data.
PING 192.168.122.17 (192.168.122.17) 65507(65535) bytes of data.
[root@localhost home]# ..................................................................................................................  
--- 192.168.122.17 ping statistics ---
1000 packets transmitted, 965 received, 3% packet loss, time 1403ms
rtt min/avg/max/mdev = 0.359/0.945/7.591/0.732 ms, ipg/ewma 1.404/1.043 ms
 
--- 192.168.122.17 ping statistics ---
1000 packets transmitted, 960 received, 4% packet loss, time 1447ms
rtt min/avg/max/mdev = 0.364/0.943/7.865/0.758 ms, ipg/ewma 1.448/0.811 ms

--- 192.168.122.17 ping statistics ---
1000 packets transmitted, 961 received, 3% packet loss, time 1465ms
rtt min/avg/max/mdev = 0.370/0.968/6.910/0.740 ms, ipg/ewma 1.467/0.521 ms

   BTW, i also tried the e1000 NIC with the same steps and same configuration & workload, the only difference was that the MTU & package size were set to 16110. The test result is that e1000 NIC has no any package loss during parallel flood ping, all the packages can be transmitted and received successfully. I have tested the kernel-2.6.18-333.el5 & kernel-2.6.18-328.el5 & kernel-2.6.18-308.el5 for guest, all the results like,
# sh flood_ping.sh 
PING 192.168.122.17 (192.168.122.17) 16110(16138) bytes of data.
.PING 192.168.122.17 (192.168.122.17) 16110(16138) bytes of data.
.PING 192.168.122.17 (192.168.122.17) 16110(16138) bytes of data.
[root@localhost home]#. 
--- 192.168.122.17 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 4057ms
rtt min/avg/max/mdev = 0.104/4.011/6.811/2.461 ms, ipg/ewma 4.061/0.421 ms
 
--- 192.168.122.17 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 4056ms
rtt min/avg/max/mdev = 0.112/4.016/6.868/2.469 ms, ipg/ewma 4.060/0.428 ms
 
--- 192.168.122.17 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 4056ms
rtt min/avg/max/mdev = 0.120/4.008/6.905/2.459 ms, ipg/ewma 4.060/0.420 ms

Above all, this issue only can happen on virtio NIC (e1000 can not hit it), according the test from kernel-308 to kernel-333, all of them have package loss for virtio NIC, so this issue maybe not a regression.

Comment 24 Sibiao Luo 2012-08-10 10:25:13 UTC
(In reply to comment #23)
> (In reply to comment #7)
> >
> (In reply to comment #22)
> >
> 
> Above all, this issue only can happen on virtio NIC (e1000 can not hit it),
> according the test from kernel-308 to kernel-333, all of them have package
> loss for virtio NIC, so this issue maybe not a regression.

I will test it next Monday with the same package size for the virtio and e1000 NICs.

Comment 25 jason wang 2012-08-10 10:40:28 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #7)
> > >
> > (In reply to comment #22)
> > >
> > 
> > Above all, this issue only can happen on virtio NIC (e1000 can not hit it),
> > according the test from kernel-308 to kernel-333, all of them have package
> > loss for virtio NIC, so this issue maybe not a regression.
> 
> I will test it next Monday with the same package size for the virtio and
> e1000 NICs.

Yes, please.

Comment 27 Michael S. Tsirkin 2012-08-12 12:06:25 UTC
I would like to check something: in comment #18,
where both host and guest kernels downgraded to -332?

If yes could you please try 333 host on 332 guest and vice versa?

Comment 28 Sibiao Luo 2012-08-13 02:43:23 UTC
Created attachment 603840 [details]
virtio_e1000_results

Comment 29 Sibiao Luo 2012-08-13 02:45:08 UTC
(In reply to comment #26)
> According to comment #23, it's not a regression, so should not be a blocker
> for 5.9. And I suspect that it's not a bug as the reporter produce a huge
> traffic. Anyway, let's wait for the reporter's test result according to
> comment #25.

Hi jason,

    I just use the default MTU value this time and set the same package size (1500) for the virtio and e1000 nic with the same steps of Comment #23, and test 5 times for each NICs(virtio and e1000) to see the results are that all of them have no package loss, the results were attached #603840.

Best wish.
sluo

Comment 30 Sibiao Luo 2012-08-13 03:18:33 UTC
(In reply to comment #28)
> Created attachment 603840 [details]
> virtio_e1000_results

Hi mst,

   The results of the comment #18 are just on the kernel-333 of host for the kernel-333 and kernel-332 guest. But jason said that this test method was very un-reliable, and then we made a agreement that should produce a consistent stress/workload during the test in private network and run for at least 5 times to see the result, you could reference Comment #23 and Comment #29 results.

   BTW, according to the Comment #23, if use 65535 for setting the virtio MTU value and package size, and use 16110 for e1000, this issue can only happen on the virtio NIC (e1000 can not hit it), all of them have package loss for virtio NIC in guest from kernel-2.6.18-308.el5(rhel5.8GA) to kernel-2.6.18-333.el5 on kernel-333 host. according to the Comment #29, if use the defalut MTU value and set the same package size(1500) for the virtio and e1000 NIC with the same steps of Comment #23, they will have no any package loss.

Best wish.
sluo

Comment 31 Sibiao Luo 2012-08-13 03:23:23 UTC
(In reply to comment #27)
> I would like to check something: in comment #18,
> where both host and guest kernels downgraded to -332?
> 
> If yes could you please try 333 host on 332 guest and vice versa?

Hi mst,

   I reply you in the Comment #30.
Thx.

Comment 34 Sibiao Luo 2012-08-13 07:41:35 UTC
Hi all,
 
   let's summary the results in a table according to my test, it maybe more clearly.

- PASS: have no any package loss.
- FAIL: have many packages loss.
- version: guest kernel version.

for virtio NIC:
+---------------------+---------------+---------------+
|  method \ version   |  kernel-332   |  kernel-333   |
+---------------------+---------------+---------------+
|  stable - workload  |     PASS      |     PASS      |
+---------------------+---------------+---------------+
| unstable - workload |     PASS      |     FAIL      |
+---------------------+---------------+---------------+

for e1000 NIC:
+---------------------+---------------+---------------+
|  method \ version   |  kernel-332   |  kernel-333   |
+---------------------+---------------+---------------+
|  stable - workload  |     PASS      |     PASS      |
+---------------------+---------------+---------------+
| unstable - workload |     PASS      |     PASS      |
+---------------------+---------------+---------------+

Best wish.
sluo

Comment 35 Sibiao Luo 2012-08-13 09:23:57 UTC
(In reply to comment #34)
> Hi all,
>  
>    let's summary the results in a table according to my test, it maybe more
> clearly.
> 
> - PASS: have no any package loss.
> - FAIL: have many packages loss.
> - version: guest kernel version.
> 
> for virtio NIC:
> +---------------------+---------------+---------------+
> |  method \ version   |  kernel-332   |  kernel-333   |
> +---------------------+---------------+---------------+
> |  stable - workload  |     PASS      |     PASS      |
> +---------------------+---------------+---------------+
> | unstable - workload |     PASS      |     FAIL      |
> +---------------------+---------------+---------------+
> 
> for e1000 NIC:
> +---------------------+---------------+---------------+
> |  method \ version   |  kernel-332   |  kernel-333   |
> +---------------------+---------------+---------------+
> |  stable - workload  |     PASS      |     PASS      |
> +---------------------+---------------+---------------+
> | unstable - workload |     PASS      |     PASS      |
> +---------------------+---------------+---------------+
> 
- unstable -workload: the mtu and package size(no matter netperf or flood ping) is different for each NIC(65535 for virtio, 16110 for e100, 1500 for rtl8139)
- stable - workload: just the MTU and package size (no matter netperf or flood ping)are the same for all the NIC (use 1500 by default here), in order to produce the same packets per second(pps) with the same package size.

Comment 36 Sibiao Luo 2012-08-13 11:07:17 UTC
(In reply to comment #34)
(In reply to comment #35)

   sorry for my mistake, please forget the Comment #34. let's summary the results in a table according to the Comment #23 and Comment #29, it maybe more clearly.

- PASS: have no any package loss.
- FAIL: have many packages loss.
- version: guest kernel version.
- unstable -workload: the mtu and package size(no matter netperf or flood ping) is different for each NIC(65535 for virtio, 16110 for e100, 1500 for rtl8139)
- stable - workload: just the MTU and package size (no matter netperf or flood ping)are the same for all the NIC (use 1500 by default here), in order to produce the same packets per second(pps) with the same package size.

for virtio NIC:
+-----------------+----------+----------+----------+----------+
|method \ version |kernel-308|kernel-323|kernel-327|kernel-333|
+-----------------+----------+----------+----------+----------+
|stable-workload  |   PASS   |   PASS   |   PASS   |   PASS   |
+-----------------+----------+----------+----------+----------+
|unstable-workload|   FALL   |   FAIL   |   FAIL   |   FAIL   |
+-----------------+----------+----------+----------+----------+

for e1000 NIC:
+-----------------+----------+----------+
|method \ version |kernel-332|kernel-333|
+-----------------+----------+----------+
|stable-workload  |   PASS   |   PASS   |
+-----------------+----------+----------+
|unstable-workload|   PASS   |   PASS   |
+-----------------+----------+----------+

Best wish.
sluo


Note You need to log in before you can comment on or make changes to this bug.