RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1512463 - Guest network can not recover immediately after ping-pong live migration over ovs-dpdk
Summary: Guest network can not recover immediately after ping-pong live migration over...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Eelco Chaudron
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-13 09:55 UTC by Pei Zhang
Modified: 2018-08-15 13:53 UTC (History)
13 users (show)

Fixed In Version: openvswitch-2.9.0-54.el7fdn
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-15 13:53:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
VM XML file. (3.31 KB, text/plain)
2017-11-13 09:55 UTC, Pei Zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2432 0 None None None 2018-08-15 13:53:51 UTC

Description Pei Zhang 2017-11-13 09:55:58 UTC
Created attachment 1351495 [details]
VM XML file.

Description of problem:
After doing ping-pong live migration, sometimes it costs 1 ~ 3 seconds to recover the network in guest, including both ping network and testpmd network. This value should be close to downtime value, should be less than 200 milliseconds.


Version-Release number of selected component (if applicable):
openvswitch-2.8.0-4.el7fdb.x86_64
3.10.0-776.el7.x86_64
qemu-kvm-rhev-2.10.0-5.el7.x86_64
libvirt-3.9.0-1.el7.x86_64
dpdk-17.05.2-4.el7fdb.x86_64
tuned-2.9.0-1.el7.noarch

How reproducible:
5/10

Steps to Reproduce:
1. Setup host1 and host2, see [1]

2. Start openvswitch in host1 and host2, set 3 dpdkvhostuser ports, see[2]. 
Note: 2 vhostuser ports for guest testpmd testing, and 1 vhostuser port for guest ping testing.

3. Launch VM, see[3]

4. Start testpmd in VM, see[4]. Also set IP(192.168.2.4) for the third network card for ping testing.

5. In host3, start MoonGen and start ping VM, see[5]

6. Do migration
# virsh migrate --verbose --persistent --live rhel7.5 qemu+ssh://192.168.1.1/system 

7. After migration finished, quit testpmd in VM,  and stop MoonGen and Ping. Get ping loss and MoonGen loss.
Note: We get ping loss by tcpdump, loss = request - reply.

8. Repeat step4~step7 several(10) times, below is the testing results. several Ping_Loss and Moongen_Loss is much higher. For example, the first line Ping_Loss is 115, it means 115*10 milliseconds(1.15 seconds) the ip network of guest can not reachable. 

Note: Below each line represents a migration run, from src to des host, then from des to src host.

===========Stream Rate: 3Mpps===========
No Stream_Rate Downtime Totaltime Ping_Loss Moongen_Loss
 0       3Mpps      128     13974       115      7168374
 1       3Mpps      145     13620        17      1169770
 2       3Mpps      140     14499       116      7141175
 3       3Mpps      142     13358        16      1150606
 4       3Mpps      136     14004        16      1124020
 5       3Mpps      139     15494       214     13170452
 6       3Mpps      136     15610       217     13282413
 7       3Mpps      146     13194        17      1167512
 8       3Mpps      148     12871        16      1162655
 9       3Mpps      137     15615       214     13170656


Actual results:
It costs longer time 1~3 seconds to recover the guest network.

Expected results:
It should cost shorter time, should be less than 200 milliseconds, this value should be close to the downtime value of migration.

Additional info:
1. This is a regression bug.
openvswitch-2.7.2-6.git20170719.el7fdp.x86_64 works well, the results looks like below and is expected:
===========Stream Rate: 3Mpps===========
No Stream_Rate Downtime Totaltime Ping_Loss Moongen_Loss
 0       3Mpps      151     12461        18      1297417
 1       3Mpps      136     13419        15      1244233
 2       3Mpps      153     13414        18      1316424
 3       3Mpps      138     13455        17      1235900
 4       3Mpps      142     11928        18      1305626
 5       3Mpps      145     13041        16      1254380
 6       3Mpps      150     12998        18      1294608
 7       3Mpps      149     13132        18      1301047
 8       3Mpps      163     13602        19      1370181
 9       3Mpps      156     13651        18      1323032


2. Details info of live migration please refer to:
https://mojo.redhat.com/docs/DOC-1102329

Reference:
[1]# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-776.el7.x86_64 root=/dev/mapper/rhel_dell--per730--11-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per730-11/root rd.lvm.lv=rhel_dell-per730-11/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on skew_tick=1 nohz=on nohz_full=2,4,6,8,10,12,14,16,18,19,17,15 rcu_nocbs=2,4,6,8,10,12,14,16,18,19,17,15 tuned.non_isolcpus=00002aab intel_pstate=disable nosoftlockup


[2] # ovs-vsctl show
ce9eab00-9854-4e61-899e-0cce5cdf70f8
    Bridge "ovsbr1"
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal
        Port "dpdk2"
            Interface "dpdk2"
                type: dpdk
                options: {dpdk-devargs="0000:06:00.0", n_rxq="1"}
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser
    Bridge "ovsbr0"
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.0", n_rxq="1"}
        Port "vhost-user0"
            Interface "vhost-user0"
                type: dpdkvhostuser
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.1", n_rxq="1"}

[3] VM xml is attached to this Comment.

[4]# /usr/bin/testpmd -l 1,2,3 -n 4 -d /usr/lib64/librte_pmd_virtio.so.1 -w 0000:00:03.0 -w 0000:00:06.0 -- --nb-cores=2 --disable-hw-vlan -i --disable-rss --rxq=1 --txq=1


[5] 
MoonGen: # ./build/MoonGen /home/nfv-virt-rt-kvm/tests/utils/rfc1242.lua 0 1 64 30 5.65

ping: # ping 192.168.2.4 -i 0.001

Comment 5 Eelco Chaudron 2017-11-20 13:25:27 UTC
Got access to Pei's setup, and was able to replicate the issue with only
sending pings trough his ovsbr1 network. From the packet generator host I send
pings as follows:

ping 192.168.2.4 -i 0.001 -D -O > out.txt;   cat out.txt |
  sed -n "s/^\([0-9]\+\) packets transmitted, \([0-9]\+\) received.*/\1 \2/p" | \
  awk '{print "Delta: " $1 " - " $2 " = " ($1 - $2)}'

After the failover, I stopped the pings and look at the packet delta. When I see
the issue I see around 120 misses.

To be sure this is as claimed introduced in 2.8, I downgraded to
openvswitch-2.7.2-6.git20170719, but I see the same problem. So I continued
troubleshooting on this version.

One off thing I see that when the tests is failing, a couple of timeouts
(delays, as they do get received) early in the failover process happen:

# grep "no answer yet for" out.txt
[1511182790.062062] no answer yet for icmp_seq=4015
[1511182790.065584] no answer yet for icmp_seq=4016
[1511182790.084769] no answer yet for icmp_seq=4020
[1511182790.088145] no answer yet for icmp_seq=4021
[1511182799.289063] no answer yet for icmp_seq=13213
[1511182799.299130] no answer yet for icmp_seq=13214
[1511182799.309198] no answer yet for icmp_seq=13215
...
...
[1511182801.473626] no answer yet for icmp_seq=13430
[1511182801.483693] no answer yet for icmp_seq=13431
[1511182801.493759] no answer yet for icmp_seq=13432

I decided to do some ovs-tcpdump captures on the vhost-user2 interface, and I
see the missing packets begin sent out by the VM (on the old host).

For the missing packets I see one dpdk2, and vhost2 on the old host (host the
VM is moving to). I do not see it on the new host, so the physical switch in
the middle might not have updated his tables?

I did notice on some tries I see replies coming from the VM but they are not
seen by the traffic generators. I see DPDK reporting them being sent out. As
there are no changes to OVS (all static rules) I believe OVS in this case ;)

Can you get me access to your switch in the middle, so I can look at that also.
I would like to do the test where the switch has learning disabled, to see if
that has any effect. What is the current configuration? Normal learning?
In addition, it would be nice to set up a mirrored port for taking some
captures to make sure packets get out.

Comment 6 Pei Zhang 2017-11-20 14:07:01 UTC
Hi (In reply to Eelco Chaudron from comment #5)
> Got access to Pei's setup, and was able to replicate the issue with only
> sending pings trough his ovsbr1 network. From the packet generator host I
> send
> pings as follows:
> 
> ping 192.168.2.4 -i 0.001 -D -O > out.txt;   cat out.txt |
>   sed -n "s/^\([0-9]\+\) packets transmitted, \([0-9]\+\) received.*/\1
> \2/p" | \
>   awk '{print "Delta: " $1 " - " $2 " = " ($1 - $2)}'
> 
> After the failover, I stopped the pings and look at the packet delta. When I
> see
> the issue I see around 120 misses.
> 
> To be sure this is as claimed introduced in 2.8, I downgraded to
> openvswitch-2.7.2-6.git20170719, but I see the same problem. So I continued
> troubleshooting on this version.
> One off thing I see that when the tests is failing, a couple of timeouts
> (delays, as they do get received) early in the failover process happen:
> 
> # grep "no answer yet for" out.txt
> [1511182790.062062] no answer yet for icmp_seq=4015
> [1511182790.065584] no answer yet for icmp_seq=4016
> [1511182790.084769] no answer yet for icmp_seq=4020
> [1511182790.088145] no answer yet for icmp_seq=4021
> [1511182799.289063] no answer yet for icmp_seq=13213
> [1511182799.299130] no answer yet for icmp_seq=13214
> [1511182799.309198] no answer yet for icmp_seq=13215
> ...
> ...
> [1511182801.473626] no answer yet for icmp_seq=13430
> [1511182801.483693] no answer yet for icmp_seq=13431
> [1511182801.493759] no answer yet for icmp_seq=13432
> 
> I decided to do some ovs-tcpdump captures on the vhost-user2 interface, and I
> see the missing packets begin sent out by the VM (on the old host).
> 
> For the missing packets I see one dpdk2, and vhost2 on the old host (host the
> VM is moving to). I do not see it on the new host, so the physical switch in
> the middle might not have updated his tables?
> 
> I did notice on some tries I see replies coming from the VM but they are not
> seen by the traffic generators. I see DPDK reporting them being sent out. As
> there are no changes to OVS (all static rules) I believe OVS in this case ;)
> 
> Can you get me access to your switch in the middle, so I can look at that
> also.

Hi Eelco,  I'll send the switch access to you by mail or irc.

> I would like to do the test where the switch has learning disabled, to see if
> that has any effect. What is the current configuration? Normal learning?

The current configuration, mac-learning is enabled. 

And there is a doc[1] about mac-learning in this live migration testing.
[1]https://mojo.redhat.com/docs/DOC-1132387


> In addition, it would be nice to set up a mirrored port for taking some
> captures to make sure packets get out.

It's ok to set up a mirror port in this switch.


Best Regards,
Pei

Comment 7 Eelco Chaudron 2017-11-30 16:17:30 UTC
Got once again access to Pei's setup, and verified the problem existed.
Build a local version of OVS using DPDK 17.05.2 and latest ovs from GitHub.

Did the ping tests manually, and could not replicate the issue anymore.
Ran Pei's script and I also do not see the issue with ping. Some odd numbers with MoonGen though.

Downgraded again to the RPM being used by Pei in the original setup. And now I can see the ping problem again. I still see the Moongen packets drops being x100 higher, but I assume this is a setup issue.

Tried to install our v2.8.1 package, but it is having problems installing due to dependency problems (el8 package). Build it on the DUT and installed it, I do not see the problem with this either!

The RPM is available here: 10.73.72.154:/root/rpmbuild/RPMS/x86_64

Looking at the diff, only the following commit looked related:

commit 894af647a8dcd9bd2af236ca9e7f52241c5b7dda
Author: wangzhike <wangzhike>
Date:   Tue Aug 29 23:12:03 2017 -0700

Removed this commit from the git build and after a rebuild/install/reboot it looks like the problem is back...

Can you please retest/rebuild the setup so that also the MoonGen part is working, and test with my 2.8.1 package? The problem should be gone as it includes the above commit.

If you still see the issue can you make the setup available again?

Thanks,

Eelco

Comment 8 Pei Zhang 2017-12-01 08:50:35 UTC
(In reply to Eelco Chaudron from comment #7)
> Got once again access to Pei's setup, and verified the problem existed.
> Build a local version of OVS using DPDK 17.05.2 and latest ovs from GitHub.
> 
> Did the ping tests manually, and could not replicate the issue anymore.
> Ran Pei's script and I also do not see the issue with ping. Some odd numbers
> with MoonGen though.
> 
> Downgraded again to the RPM being used by Pei in the original setup. And now
> I can see the ping problem again. I still see the Moongen packets drops
> being x100 higher, but I assume this is a setup issue.
> 
> Tried to install our v2.8.1 package, but it is having problems installing
> due to dependency problems (el8 package). Build it on the DUT and installed
> it, I do not see the problem with this either!
> 
> The RPM is available here: 10.73.72.154:/root/rpmbuild/RPMS/x86_64
> 
> Looking at the diff, only the following commit looked related:
> 
> commit 894af647a8dcd9bd2af236ca9e7f52241c5b7dda
> Author: wangzhike <wangzhike>
> Date:   Tue Aug 29 23:12:03 2017 -0700
> 
> Removed this commit from the git build and after a rebuild/install/reboot it
> looks like the problem is back...
> 
> Can you please retest/rebuild the setup so that also the MoonGen part is
> working, and test with my 2.8.1 package? The problem should be gone as it
> includes the above commit.

The reason why the MoonGen results looks odd is the the src/des host is using 1G hugepage size, however in guest is 2M hugepage size. I have updated the kernel line of guest. Now the MoonGen testing parts look good.

With 2.8.1, I still hit the problem.

> If you still see the issue can you make the setup available again?

The environment is ready, now I leave them to you.


Best Regards,
Pei

> Thanks,
> 
> Eelco

Comment 9 Pei Zhang 2017-12-22 08:18:28 UTC
Hi Eelco,

Update about the reproducible setup testing:

1. With only ping testing, it's hard to reproduce the recover issue. Which means the network can recover around 150ms with only ping packets flow. 

2. When using all vCPUs from same NUMA node, still hit this recover issue.

3. When testing PVP(replace openvswitch with dpdk's testpmd in host), didn't hit this recover delay issue. So this proves that this bug can not be a qemu-kvm bug.

4. When I tried latest openvswitch to check if this issue is gone or not, I hit this bug:
Bug 1528229 - Live migration fails when testing VM with openvswitch multiple pmds and vhost-user single queue

Testing results please refer to: http://pastebin.test.redhat.com/542789

I'm afraid I can not provide a simple reproducer besides above testing. Also, If you needs debug on QE environments, please let me know in advance, I'll prepare it for you.


Best Regards,
Pei

Comment 16 Eelco Chaudron 2018-03-12 08:21:05 UTC
Might be a duplicate of 1552465, same test with different results. However, as 2.8 will not be shipped we should close this one (if true).

Comment 17 Pei Zhang 2018-03-12 09:18:27 UTC

*** This bug has been marked as a duplicate of bug 1552465 ***

Comment 18 Eelco Chaudron 2018-03-21 11:15:01 UTC
Re-opening this BZ to track the TTL issue. This got introduced somewhere between DPDK 16.11.5 and 17.05.

Comment 19 Eelco Chaudron 2018-03-27 16:43:08 UTC
Did "some" dissecting, and the below commit is causing the TTL variation. Will spend some more time on this later to figure out why.


commit af14759181240120f76c82f894982e8f33f0ba2a
Author: Yuanhan Liu <yuanhan.liu.com>
Date:   Sat Apr 1 15:22:56 2017 +0800

    vhost: introduce API to start a specific driver
    
    We used to use rte_vhost_driver_session_start() to trigger the vhost-user
    session. It takes no argument, thus it's a global trigger. And it could
    be problematic.
    
    The issue is, currently, rte_vhost_driver_register(path, flags) actually
    tries to put it into the session loop (by fdset_add). However, it needs
    a set of APIs to set a vhost-user driver properly:
      * rte_vhost_driver_register(path, flags);
      * rte_vhost_driver_set_features(path, features);
      * rte_vhost_driver_callback_register(path, vhost_device_ops);
    
    If a new vhost-user driver is registered after the trigger (think OVS-DPDK
    that could add a port dynamically from cmdline), the current code will
    effectively starts the session for the new driver just after the first
    API rte_vhost_driver_register() is invoked, leaving later calls taking
    no effect at all.
    
    To handle the case properly, this patch introduce a new API,
    rte_vhost_driver_start(path), to trigger a specific vhost-user driver.
    To do that, the rte_vhost_driver_register(path, flags) is simplified
    to create the socket only and let rte_vhost_driver_start(path) to
    actually put it into the session loop.
    
    Meanwhile, the rte_vhost_driver_session_start is removed: we could hide
    the session thread internally (create the thread if it has not been
    created). This would also simplify the application.
    
    NOTE: the API order in prog guide is slightly adjusted for showing the
    correct invoke order.
    
    Signed-off-by: Yuanhan Liu <yuanhan.liu.com>
    Reviewed-by: Maxime Coquelin <maxime.coquelin>

Comment 21 Eelco Chaudron 2018-05-25 11:01:16 UTC
Thanks Pei for letting me use your setup for multiple days in a row!

Finally, I was able to figure out the root cause of your problem, and submitted a patch upstream:

https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347592.html

I also did a build based on the latest fdn, and I was no longer able to replicate the issue in 10 successive runs:

=======================Stream Rate: 3Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       3Mpps      140     14130        15    1305933.0
 1       3Mpps      147     13706        15    1433091.0
 2       3Mpps      146     13913        14    1240501.0
 3       3Mpps      145     14481        14    1232944.0
 4       3Mpps      155     14151        15    1295038.0
 5       3Mpps      152     13921        15    1430381.0
 6       3Mpps      142     13741        14    1392868.0
 7       3Mpps      149     14422        16    1427108.0
 8       3Mpps      134     14096        13    1347977.0
 9       3Mpps      147     14070        16    1271768.0
<------------------------Summary------------------------>
   Max   3Mpps      155     14481        16      1433091
   Min   3Mpps      134     13706        13      1232944
  Mean   3Mpps      145     14063        14      1337760
Median   3Mpps      146     14083        15      1326955
 Stdev       0      6.0    256.21      0.95     79183.17

Comment 22 Eelco Chaudron 2018-05-28 11:09:09 UTC
Sent out a v2 patch: https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347656.html

Comment 25 Pei Zhang 2018-07-26 03:13:36 UTC
Summary: Testing openvswitch-2.9.0-55.el7fdp.x86_64 with rhel7.6 and latest packages, this bug can not be reproduced any more, this issue has been fixed very well.


(Will test openvswitch-2.9.0-55.el7fdp.x86_64 with rhel7.5.z later )


Versions:
qemu-kvm-rhev-2.12.0-8.el7.x86_64
libvirt-4.5.0-4.el7.x86_64
tuned-2.9.0-1.el7.noarch
dpdk-17.11-10.el7fdb.x86_64
openvswitch-2.9.0-55.el7fdp.x86_64


Note: Testing with ovs as vhost-user client mode, qemu as vhost-user server mode.


Scenario 1: live migration with vhost-user 2 queues: PASS

=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      158     19813        15     451189.0
 1       1Mpps      158     22763        19     477425.0
 2       1Mpps      163     19838        17     482764.0
 3       1Mpps      158     21749        15     472611.0
 4       1Mpps      131     64385        13     106887.0
 5       1Mpps      164     20071        16     485374.0
 6       1Mpps      130     64380        15     102214.0
 7       1Mpps      155     19751        15     467876.0
 8       1Mpps      155     20121        16     465566.0
 9       1Mpps      166     19468        16     487194.0
|------------------------Statistic------------------------|
   Max   1Mpps      166     64385        19       487194
   Min   1Mpps      130     19468        13       102214
  Mean   1Mpps      153     29233        15       399910
Median   1Mpps      158     20096        15       470243
 Stdev       0    12.82  18553.47      1.57    156036.46

=======================Stream Rate: 2Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       2Mpps      165     16224        16     991056.0
 1       2Mpps      163     17201        17    1000683.0
 2       2Mpps      168     21262        16    1158717.0
 3       2Mpps      167     16585        17    1011464.0
 4       2Mpps      166     16228        16     993732.0
 5       2Mpps      164     16541        17    1004944.0
 6       2Mpps      153     15825        18     955247.0
 7       2Mpps      161     15833        15     972007.0
 8       2Mpps      153     15436        16     945746.0
 9       2Mpps      161     19428        17    1021792.0
|------------------------Statistic------------------------|
   Max   2Mpps      168     21262        18      1158717
   Min   2Mpps      153     15436        15       945746
  Mean   2Mpps      162     17056        16      1005538
Median   2Mpps      163     16384        16       997207
 Stdev       0     5.32   1851.07      0.85     59033.65

=======================Stream Rate: 3Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       3Mpps      127     64209        13    2269768.0
 1       3Mpps      157     18554        15    1550979.0
 2       3Mpps      177     17353        18    1580577.0
 3       3Mpps      170     17429        17    1551081.0
 4       3Mpps      165     17262        16    1511554.0
 5       3Mpps      150     17069        17    1443302.0
 6       3Mpps      123     64349        12    2220502.0
 7       3Mpps      131     64238        15    2166204.0
 8       3Mpps      162     16466        16    1475725.0
 9       3Mpps      151     19991        18    1517579.0
|------------------------Statistic------------------------|
   Max   3Mpps      177     64349        18      2269768
   Min   3Mpps      123     16466        12      1443302
  Mean   3Mpps      151     31692        15      1728727
Median   3Mpps      154     17991        16      1551030
 Stdev       0    18.71  22498.21       2.0    341285.16

=======================Stream Rate: 4Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       4Mpps      164     18768        19    2113516.0
 1       4Mpps      163     17917        16    2046435.0
 2       4Mpps      163     18205        18    2079404.0
 3       4Mpps      162     22842        16    2389273.0
 4       4Mpps      161     17272        19    2039292.0
 5       4Mpps      171     17328        17    2089298.0
 6       4Mpps      166     16743        16    2010342.0
 7       4Mpps      167     16725        17    2040245.0
 8       4Mpps      161     16781        15    1977642.0
 9       4Mpps      164     18567        17    2093860.0
|------------------------Statistic------------------------|
   Max   4Mpps      171     22842        19      2389273
   Min   4Mpps      161     16725        15      1977642
  Mean   4Mpps      164     18114        17      2087930
Median   4Mpps      163     17622        17      2062919
 Stdev       0     3.08   1824.12      1.33    113586.32

=======================Stream Rate: 5Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       5Mpps      127     64369        12    5211278.0
 1       5Mpps      124     64338        12    5173191.0
 2       5Mpps      163     16599        17    2478128.0
 3       5Mpps      127     64352        12    5145579.0
 4       5Mpps      131     64247        12    5214652.0
 5       5Mpps      135     64699        14    5166328.0
 6       5Mpps      139     64261        14    5172530.0
 7       5Mpps      133     64133        13    5146551.0
 8       5Mpps      125     64086        14    5180036.0
 9       5Mpps      136     64246        13    5234906.0
|------------------------Statistic------------------------|
   Max   5Mpps      163     64699        17      5234906
   Min   5Mpps      124     16599        12      2478128
  Mean   5Mpps      134     59533        13      4912317
Median   5Mpps      132     64254        13      5172860
 Stdev       0    11.35  15086.39      1.57    855788.25



Scenario 2: live migration with vhost-user 1 queue: PASS

=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      146     15641        17     493348.0
 1       1Mpps      143     13976        16     482615.0
 2       1Mpps      147     14018        16     495862.0
 3       1Mpps      138     13818        16     477795.0
 4       1Mpps      140     13762        16     480692.0
 5       1Mpps      127     13849        15     453224.0
 6       1Mpps      142     13836        14     485654.0
 7       1Mpps      133     13687        14     467472.0
 8       1Mpps      138     13656        16     478171.0
 9       1Mpps      143     13650        15     486969.0
|------------------------Statistic------------------------|
   Max   1Mpps      147     15641        17       495862
   Min   1Mpps      127     13650        14       453224
  Mean   1Mpps      139     13989        15       480180
Median   1Mpps      141     13827        16       481653
 Stdev       0     6.07    593.57      0.97     12469.57

=======================Stream Rate: 2Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       2Mpps      142     13806        15     978357.0
 1       2Mpps      145     13694        16     982945.0
 2       2Mpps      135     13756        16     948356.0
 3       2Mpps      137     13638        16     952975.0
 4       2Mpps      144     13693        17     988426.0
 5       2Mpps      130     13633        14     927550.0
 6       2Mpps      136     14588        16     949244.0
 7       2Mpps      127     14459        15     916520.0
 8       2Mpps      129     13592        15     920487.0
 9       2Mpps      129     14542        14     924674.0
|------------------------Statistic------------------------|
   Max   2Mpps      145     14588        17       988426
   Min   2Mpps      127     13592        14       916520
  Mean   2Mpps      135     13940        15       948953
Median   2Mpps      135     13725        15       948800
 Stdev       0     6.62    412.52      0.97     26883.61

=======================Stream Rate: 3Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       3Mpps      146     14412        17    1487473.0
 1       3Mpps      140     14103        16    1447916.0
 2       3Mpps      132     14177        15    1401315.0
 3       3Mpps      140     14214        16    1450662.0
 4       3Mpps      143     14067        16    1471002.0
 5       3Mpps      136     14114        16    1419882.0
 6       3Mpps      138     14045        16    1443870.0
 7       3Mpps      148     13834        17    1501742.0
 8       3Mpps      138     13741        16    1439716.0
 9       3Mpps      141     13742        16    1448684.0
|------------------------Statistic------------------------|
   Max   3Mpps      148     14412        17      1501742
   Min   3Mpps      132     13741        15      1401315
  Mean   3Mpps      140     14044        16      1451226
Median   3Mpps      140     14085        16      1448300
 Stdev       0     4.69    215.52      0.57     29692.28

=======================Stream Rate: 4Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       4Mpps      141     13609        16    1949419.0
 1       4Mpps      142     13639        16    1971410.0
 2       4Mpps      145     13717        17    1973021.0
 3       4Mpps      131     14495        15    1875741.0
 4       4Mpps      139     13673        16    1941009.0
 5       4Mpps      129     14425        15    1856364.0
 6       4Mpps      132     14567        15    1873700.0
 7       4Mpps      135     14480        16    1914211.0
 8       4Mpps      136     14414        16    1910466.0
 9       4Mpps      134     14372        15    1895856.0
|------------------------Statistic------------------------|
   Max   4Mpps      145     14567        17      1973021
   Min   4Mpps      129     13609        15      1856364
  Mean   4Mpps      136     14139        15      1916119
Median   4Mpps      135     14393        16      1912338
 Stdev       0     5.21    416.87      0.67      41459.4

=======================Stream Rate: 5Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       5Mpps      142     14399        15   12322273.0
 1       5Mpps      142     14350        16   12419495.0
 2       5Mpps      136     14126        15   10583626.0
 3       5Mpps      136     13788        16   12459350.0
 4       5Mpps      131     13840        15   12293454.0
 5       5Mpps      137     13762        16    6264477.0
 6       5Mpps      136     13791        16   10688540.0
 7       5Mpps      142     13790        16   12025734.0
 8       5Mpps      148     13842        17   10769242.0
 9       5Mpps      137     14774        15   12217261.0
|------------------------Statistic------------------------|
   Max   5Mpps      148     14774        17     12459350
   Min   5Mpps      131     13762        15      6264477
  Mean   5Mpps      138     14046        15     11204345
Median   5Mpps      137     13841        16     12121497
 Stdev       0     4.79    352.02      0.67   1898279.47

Comment 26 Pei Zhang 2018-07-27 00:28:22 UTC
Test openvswitch-2.9.0-55.el7fdp.x86_64 with rhel7.5.z. The issue can not reproduced either, it has been fixed very well.


Versions:
3.10.0-862.11.1.el7.x86_64
libvirt-3.9.0-14.el7_5.6.x86_64
openvswitch-2.9.0-55.el7fdp.x86_64
tuned-2.9.0-1.el7.noarch
dpdk-17.11-10.el7fdb.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64


Note: Testing with ovs as vhost-user client mode, qemu as vhost-user server mode.


Scenario 1: live migration with vhost-user 2 queues: PASS

=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      144     27265        18     357671.0
 1       1Mpps      152     17641        15     359873.0
 2       1Mpps      153     21895        15     363786.0
 3       1Mpps      150     21890        14     522820.0
 4       1Mpps      139     32116        17     369904.0
 5       1Mpps      159     19643        19     376134.0
 6       1Mpps      155     22023        15     367286.0
 7       1Mpps      164     19412        17     380800.0
 8       1Mpps      160     24038        16     378965.0
 9       1Mpps      115     64264        12        569.0
|------------------------Statistic------------------------|
   Max   1Mpps      164     64264        19       522820
   Min   1Mpps      115     17641        12          569
  Mean   1Mpps      149     27018        15       347780
Median   1Mpps      152     21959        15       368595
 Stdev       0     14.1  13743.11      2.04     131416.0

=======================Stream Rate: 2Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       2Mpps      155     14612        16     748606.0
 1       2Mpps      156     14854        15    1080685.0
 2       2Mpps      148     16990        16     740497.0
 3       2Mpps      150     15983        16     734332.0
 4       2Mpps      153     16111        16     741944.0
 5       2Mpps      151     20197        14     776075.0
 6       2Mpps      152     15672        16     732666.0
 7       2Mpps      147     15165        15     712528.0
 8       2Mpps      153     19077        16     771798.0
 9       2Mpps      148     15517        16     724650.0
|------------------------Statistic------------------------|
   Max   2Mpps      156     20197        16      1080685
   Min   2Mpps      147     14612        14       712528
  Mean   2Mpps      151     16417        15       776378
Median   2Mpps      151     15827        16       741220
 Stdev       0     3.06   1844.15       0.7    108678.66

=======================Stream Rate: 3Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       3Mpps      147     15393        16    1600504.0
 1       3Mpps      156     15048        16    1610687.0
 2       3Mpps      148     14861        16    1122905.0
 3       3Mpps      152     15075        19    1695779.0
 4       3Mpps      158     15380        16    1188020.0
 5       3Mpps      132     15750        14    1590752.0
 6       3Mpps      132     14882        16    1114096.0
 7       3Mpps      152     14774        16    1137946.0
 8       3Mpps      147     14713        14    1581406.0
 9       3Mpps      154     16707        16    1692790.0
|------------------------Statistic------------------------|
   Max   3Mpps      158     16707        19      1695779
   Min   3Mpps      132     14713        14      1114096
  Mean   3Mpps      147     15258        15      1433488
Median   3Mpps      150     15061        16      1586079
 Stdev       0      9.1    603.93      1.37    255606.53

=======================Stream Rate: 4Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       4Mpps      149     15965        17    1621926.0
 1       4Mpps      156     17400        17    1738364.0
 2       4Mpps      147     16126        15    2191332.0
 3       4Mpps      157     20683        18    1900610.0
 4       4Mpps      154     15384        16    2213498.0
 5       4Mpps      161     15015        17    2245792.0
 6       4Mpps      152     15384        15    1576313.0
 7       4Mpps      156     14989        15    1711173.0
 8       4Mpps      147     15699        15    1560115.0
 9       4Mpps      155     15507        18    1594805.0
|------------------------Statistic------------------------|
   Max   4Mpps      161     20683        18      2245792
   Min   4Mpps      147     14989        15      1560115
  Mean   4Mpps      153     16215        16      1835392
Median   4Mpps      154     15603        16      1724768
 Stdev       0      4.6   1716.89      1.25     281569.6

=======================Stream Rate: 5Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       5Mpps      156     14371        15    1960618.0
 1       5Mpps      145     16697        14    2045776.0
 2       5Mpps      149     21116        17    2474624.0
 3       5Mpps      157     14248        15    1960252.0
 4       5Mpps      148     14980        18    1957918.0
 5       5Mpps      156     14606        17    1992246.0
 6       5Mpps      151     15013        15    1908936.0
 7       5Mpps      163     14654        15    2046271.0
 8       5Mpps      159     15878        17    2873091.0
 9       5Mpps      147     14634        17    2111070.0
|------------------------Statistic------------------------|
   Max   5Mpps      163     21116        18      2873091
   Min   5Mpps      145     14248        14      1908936
  Mean   5Mpps      153     15619        16      2133080
Median   5Mpps      153     14817        16      2019011
 Stdev       0     5.92    2070.6      1.33    305555.95



Scenario 2: live migration with vhost-user 1 queue: PASS

=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      124     13399        14     287705.0
 1       1Mpps      127     13507        15     294062.0
 2       1Mpps      133     13954        14     374276.0
 3       1Mpps      131     13224        14     301607.0
 4       1Mpps      126     13748        15     314343.0
 5       1Mpps      128     13092        14     293806.0
 6       1Mpps      130     14259        15     459060.0
 7       1Mpps      128     13447        15     428637.0
 8       1Mpps      127     13632        15     295462.0
 9       1Mpps      127     13757        14     293573.0
|------------------------Statistic------------------------|
   Max   1Mpps      133     14259        15       459060
   Min   1Mpps      124     13092        14       287705
  Mean   1Mpps      128     13601        14       334253
Median   1Mpps      127     13569        14       298534
 Stdev       0      2.6    346.26      0.53     63356.78

=======================Stream Rate: 2Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       2Mpps      125     13441        14     587439.0
 1       2Mpps      124     13722        14     822653.0
 2       2Mpps      135     13267        15     633917.0
 3       2Mpps      124     13507        14     594731.0
 4       2Mpps      137     13112        16     648588.0
 5       2Mpps      124     13319        15     590963.0
 6       2Mpps      120     12994        14     572861.0
 7       2Mpps      126     13312        15     600018.0
 8       2Mpps      137     13015        16     639411.0
 9       2Mpps      130     13268        14     615823.0
|------------------------Statistic------------------------|
   Max   2Mpps      137     13722        16       822653
   Min   2Mpps      120     12994        14       572861
  Mean   2Mpps      128     13295        14       630640
Median   2Mpps      125     13290        14       607920
 Stdev       0     6.14    224.46      0.82     71883.01

=======================Stream Rate: 3Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       3Mpps      133     13422        15    1399797.0
 1       3Mpps      134     13221        15     960617.0
 2       3Mpps      120     12953        14     877478.0
 3       3Mpps      121     14332        14    1333289.0
 4       3Mpps      134     13470        15     960206.0
 5       3Mpps      119     13543        14     875418.0
 6       3Mpps      126     13391        14     908054.0
 7       3Mpps      123     13710        15     891404.0
 8       3Mpps      128     13327        15     920344.0
 9       3Mpps      126     13488        15     916400.0
|------------------------Statistic------------------------|
   Max   3Mpps      134     14332        15      1399797
   Min   3Mpps      119     12953        14       875418
  Mean   3Mpps      126     13485        14      1004300
Median   3Mpps      126     13446        15       918372
 Stdev       0     5.76    359.34      0.52    193787.49

=======================Stream Rate: 4Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       4Mpps      131     13169        15    1272490.0
 1       4Mpps      135     13461        15    1313776.0
 2       4Mpps      128     13141        15    1249004.0
 3       4Mpps      130     13223        15    1263447.0
 4       4Mpps      123     13838        14    1765944.0
 5       4Mpps      128     13545        15    1852889.0
 6       4Mpps      128     13643        14    1253172.0
 7       4Mpps      127     14456        14    1864338.0
 8       4Mpps      134     13528        15    1298461.0
 9       4Mpps      131     14292        14    1869694.0
|------------------------Statistic------------------------|
   Max   4Mpps      135     14456        15      1869694
   Min   4Mpps      123     13141        14      1249004
  Mean   4Mpps      129     13629        14      1500321
Median   4Mpps      129     13536        15      1306118
 Stdev       0      3.5    450.61      0.52    292804.48

=======================Stream Rate: 5Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       5Mpps      124     12988        15    4112925.0
 1       5Mpps      129     13213        15    6736262.0
 2       5Mpps      133     12889        16    4543194.0
 3       5Mpps      119     13981        13   12241181.0
 4       5Mpps      122     13404        13    6211496.0
 5       5Mpps      122     14401        15   12206880.0
 6       5Mpps      129     14110        15   10317434.0
 7       5Mpps      128     14379        14   12140129.0
 8       5Mpps      131     14095        15   11325710.0
 9       5Mpps      133     13313        15    6698748.0
|------------------------Statistic------------------------|
   Max   5Mpps      133     14401        16     12241181
   Min   5Mpps      119     12889        13      4112925
  Mean   5Mpps      127     13677        14      8653395
Median   5Mpps      128     13692        15      8526848
 Stdev       0     4.94    576.35      0.97   3308836.75

Comment 27 Pei Zhang 2018-07-27 00:30:28 UTC
Comment 25 and Comment 26 prove this bug has been fixed very well. Thanks Eelco.

Move to 'VERIFIED'.

Comment 28 Timothy Redaelli 2018-08-10 13:45:33 UTC
The openvwitch component is delivered through the fast datapath channel, it is not documented in release notes.

Comment 30 errata-xmlrpc 2018-08-15 13:53:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2432


Note You need to log in before you can comment on or make changes to this bug.