Bug 1573912
Summary: | Backport "net/nfp: fix mbufs releasing when stop or close" | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pablo Cascon (Netronome) <pablo.cascon> | |
Component: | dpdk | Assignee: | Timothy Redaelli <tredaelli> | |
Status: | CLOSED ERRATA | QA Contact: | Jean-Tsung Hsiao <jhsiao> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 7.6 | CC: | frederick.botha, jhsiao, pablo.cascon | |
Target Milestone: | rc | Keywords: | Extras | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | dpdk-17.11-11.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1575067 (view as bug list) | Environment: | ||
Last Closed: | 2018-06-26 18:40:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Pablo Cascon (Netronome)
2018-05-02 13:51:07 UTC
The commit message captures the essence of the problem. In short, it fixes improper resource freeing in some scenarios when the rte_eth_dev_stop callback is invoked while using the NFP PMD. Without the fix, a user could break his DPDK app when using chained mbufs on TX (e.g. TSO). We want to include this fix in the DPDK package that ships with RHEL 7.6. Jean-Tsung, please let us know if you require more information. (In reply to Frik Botha from comment #3) > The commit message captures the essence of the problem. In short, it fixes > improper resource freeing in some scenarios when the rte_eth_dev_stop > callback is invoked while using the NFP PMD. Without the fix, a user could > break his DPDK app when using chained mbufs on TX (e.g. TSO). We want to > include this fix in the DPDK package that ships with RHEL 7.6. Jean-Tsung, > please let us know if you require more information. Please give an example on "DPDK app when using chained mbufs on TX (e.g. TSO)". Thanks! Jean Hi Jean, Any app using mbufs of which the next field in the rte_mbuf structure is set can run into the problem. However, the app must use it in conjunction with device hotplugging (rte_eth_dev_stop/rte_eth_dev_start) if I'm not mistaken. We initially triggered the bug using virtio-forwarder (https://github.com/Netronome/virtio-forwarder). I guess you would be able to reproduce it using some of DPDK's standard tools as well. I have not tried, but using testpmd's 'tso set' along with 'port attach/detach' may suffice. Frik (In reply to Frik Botha from comment #5) > Hi Jean, > > Any app using mbufs of which the next field in the rte_mbuf structure is set > can run into the problem. However, the app must use it in conjunction with > device hotplugging (rte_eth_dev_stop/rte_eth_dev_start) if I'm not mistaken. > We initially triggered the bug using virtio-forwarder > (https://github.com/Netronome/virtio-forwarder). I guess you would be able > to reproduce it using some of DPDK's standard tools as well. I have not > tried, but using testpmd's 'tso set' along with 'port attach/detach' may > suffice. > > Frik Two questions: Need to use huge packet ? Would interchanging PMS mask between 0x0a and 0xa0 creating PMD STOP or CLOSE ? > Need to use huge packet ? No, any multi-segment packet will do. > Would interchanging PMS mask between 0x0a and 0xa0 creating PMD STOP or CLOSE ? I don't follow you here, can you please elaborate? Are you referring to a pci device/port whitelist mask? (In reply to Frik Botha from comment #7) > > Need to use huge packet ? > No, any multi-segment packet will do. > > > Would interchanging PMS mask between 0x0a and 0xa0 creating PMD STOP or CLOSE ? > I don't follow you here, can you please elaborate? Are you referring to a > pci device/port whitelist mask? While running OVS-dpdk vxlan traffic over nfp, I was running the following loop by switching pmd-cpu-mask between 0x8080 and 0x2020. for i in {1..300}; do echo LOOP $i; ovs-vsctl set Open_vSwitch . other_config:pdk-lcore-mask=0x8080 other_config:pmd-cpu-mask=0x2020; sleep 5; ovs-vsctl set Open_vSwitch . other_config:pdk-lcore-mask=0x2020 other_config:pmd-cpu-mask=0x8080; sleep 5; done I am not familiar with the finer details of the ovs-dpdk datapath or how they implement CPU pinning changes. If such changes entail device stop/start, then the procedure may work for reproduction. I would not bet on it though. If you are looking for an easy way to trigger this, I would suggest you use testpmd in conjunction with a dpdk which was compiled with CONFIG_RTE_LIBRTE_MBUF_DEBUG=y. (In reply to Frik Botha from comment #9) > I am not familiar with the finer details of the ovs-dpdk datapath or how > they implement CPU pinning changes. If such changes entail device > stop/start, then the procedure may work for reproduction. I would not bet on > it though. > > If you are looking for an easy way to trigger this, I would suggest you use > testpmd in conjunction with a dpdk which was compiled with > CONFIG_RTE_LIBRTE_MBUF_DEBUG=y. Hi Frik, This is specific to NFP. I would suggest you can run the QA test yourself based on your reproducer. This will save our QA time. For my part I have started running a set of automation tests to check the stability of OVS. Thanks! Jean Sounds fair. Will you then include/backport this commit if the automation tests pass? (In reply to Frik Botha from comment #11) > Sounds fair. Will you then include/backport this commit if the automation > tests pass? Sure.(In reply to Frik Botha from comment #11) > Sounds fair. Will you then include/backport this commit if the automation > tests pass? It's already in OVS 2.9.0-30 or later. So, you run QA against it. Once passed, please make a comment here. Thanks! Jean (In reply to Jean-Tsung Hsiao from comment #12) > (In reply to Frik Botha from comment #11) > > Sounds fair. Will you then include/backport this commit if the automation > > tests pass? > > Sure.(In reply to Frik Botha from comment #11) > > Sounds fair. Will you then include/backport this commit if the automation > > tests pass? > > It's already in OVS 2.9.0-30 or later. So, you run QA against it. Once > passed, please make a comment here. > > Thanks! > > Jean Hi Frik, I have run my stable testing using OVS 2.9.0-37 through netperf testing between guests over OVS-dpdk/{vxlan,geneve}/nfp tunnel. I assumed you have done your reproducer testing. Please let me know if you have encountered issues. Thanks! Jean Hi Jean, We do not have access to the new package yet. Pablo has asked that it be provided. Its COB here, so I'll test on Monday and then share the results. Frik (In reply to Frik Botha from comment #15) > Hi Jean, > > We do not have access to the new package yet. Pablo has asked that it be > provided. Its COB here, so I'll test on Monday and then share the results. > > Frik Great. Thanks! Jean Unfortunately the request to share the dpdk pkg with Netronome has not been considered yet. Will update once it is. If it takes a considerable amount of time will try to test at RedHat, let me know if this is becoming urgent. The package has been verified by running concurrent netperf tests between guests over OVS-dpdk/vxlan/nfp tunnel --- no crash observed. Hi Jean, I finally got the package and ran my test. The new package you gave us indeed fixes the issue - see the output below: ************************* ************************* The respective package versions and the error output for the pre-fix test: > rpm -qa | grep "dpdk\|virtio-forwarder" virtio-forwarder-1.1.99.14-1.el7.x86_64 dpdk-tools-17.11-7.el7.x86_64 dpdk-devel-17.11-7.el7.x86_64 dpdk-doc-17.11-7.el7.noarch dpdk-17.11-7.el7.x86_64 journalctl call trace: virtio-forwarder[28234]: PANIC in nfp_net_xmit_pkts(): vio4wd-start.sh[28231]: NFP_NET: x7: [/lib64/libc.so.6(clone+0x6d) [0x7f8f05e5ab3d]] vio4wd-start.sh[28231]: 6: [/lib64/libpthread.so.0(+0x7dd5) [0x7f8f06130dd5]] vio4wd-start.sh[28231]: 5: [/lib64/librte_eal.so.6(+0xba2d) [0x7f8f0718da2d]] vio4wd-start.sh[28231]: 4: [/usr/bin/virtio-forwarder() [0x40e110]] vio4wd-start.sh[28231]: 3: [/usr/lib64/dpdk-pmds/librte_pmd_nfp.so.1(+0x4a14) [0x7f8f02d37a14]] vio4wd-start.sh[28231]: 2: [/lib64/librte_eal.so.6(__rte_panic+0xbd) [0x7f8f07188a6d]] vio4wd-start.sh[28231]: 1: [/lib64/librte_eal.so.6(rte_dump_stack+0x2d) [0x7f8f0718f81d]] systemd[1]: virtio-forwarder.service: main process exited, code=exited, status=1/FAILURE systemd[1]: Unit virtio-forwarder.service entered failed state. systemd[1]: virtio-forwarder.service failed. ************************* ************************* Everything works fine with the new packages: > rpm -qa | grep "dpdk\|virtio-forwarder" virtio-forwarder-1.1.99.14-1.el7.x86_64 dpdk-debuginfo-17.11-11.el7.x86_64 dpdk-17.11-11.el7.x86_64 dpdk-devel-17.11-11.el7.x86_64 dpdk-tools-17.11-11.el7.x86_64 dpdk-doc-17.11-7.el7.noarch ************************* ************************* Thanks! Frik (In reply to Frik Botha from comment #20) > Hi Jean, > > I finally got the package and ran my test. The new package you gave us indeed > fixes the issue - see the output below: > > ************************* > ************************* > > The respective package versions and the error output for the pre-fix test: > > > rpm -qa | grep "dpdk\|virtio-forwarder" > virtio-forwarder-1.1.99.14-1.el7.x86_64 > dpdk-tools-17.11-7.el7.x86_64 > dpdk-devel-17.11-7.el7.x86_64 > dpdk-doc-17.11-7.el7.noarch > dpdk-17.11-7.el7.x86_64 > > journalctl call trace: > virtio-forwarder[28234]: PANIC in nfp_net_xmit_pkts(): > vio4wd-start.sh[28231]: NFP_NET: x7: [/lib64/libc.so.6(clone+0x6d) > [0x7f8f05e5ab3d]] > vio4wd-start.sh[28231]: 6: [/lib64/libpthread.so.0(+0x7dd5) [0x7f8f06130dd5]] > vio4wd-start.sh[28231]: 5: [/lib64/librte_eal.so.6(+0xba2d) [0x7f8f0718da2d]] > vio4wd-start.sh[28231]: 4: [/usr/bin/virtio-forwarder() [0x40e110]] > vio4wd-start.sh[28231]: 3: > [/usr/lib64/dpdk-pmds/librte_pmd_nfp.so.1(+0x4a14) [0x7f8f02d37a14]] > vio4wd-start.sh[28231]: 2: [/lib64/librte_eal.so.6(__rte_panic+0xbd) > [0x7f8f07188a6d]] > vio4wd-start.sh[28231]: 1: [/lib64/librte_eal.so.6(rte_dump_stack+0x2d) > [0x7f8f0718f81d]] > systemd[1]: virtio-forwarder.service: main process exited, code=exited, > status=1/FAILURE > systemd[1]: Unit virtio-forwarder.service entered failed state. > systemd[1]: virtio-forwarder.service failed. > ************************* > ************************* > > Everything works fine with the new packages: > > > rpm -qa | grep "dpdk\|virtio-forwarder" > virtio-forwarder-1.1.99.14-1.el7.x86_64 > dpdk-debuginfo-17.11-11.el7.x86_64 > dpdk-17.11-11.el7.x86_64 > dpdk-devel-17.11-11.el7.x86_64 > dpdk-tools-17.11-11.el7.x86_64 > dpdk-doc-17.11-7.el7.noarch > ************************* > ************************* > > Thanks! > Frik Hi Frik, Wonderful! Thanks a lot! Jean Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2038 |