Bug 2119876
Summary: | [ENIC] Can't add a 25 Gb enic to OVS-DPDK | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Jean-Tsung Hsiao <jhsiao> |
Component: | openvswitch | Assignee: | Kevin Traynor <ktraynor> |
openvswitch sub component: | ovs-dpdk | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
Status: | MODIFIED --- | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | ctrautma, fleitner, jhsiao, ktraynor, tli |
Version: | RHEL 8.0 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openvswitch3.2-3.2.0-10.el9fdp | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2173805 |
Description
Jean-Tsung Hsiao
2022-08-19 18:05:20 UTC
Re: "dpdk|ERR|PMD: rte_enic_pmd: Devcmd 88 failed with error code -1" It affects only 2.15. The 2.17 dpdk contains the fix below: https://inbox.dpdk.org/dev/20211026000418.13540-1-hyonkim@cisco.com/T/ Probing the availability of Flow Manager API may print the following error log. PMD: rte_enic_pmd: Devcmd 88 failed with error code -1 The error indicates a flow manager operation failed and happens when advanced filtering is disabled on vNIC. It is harmless but confusing to the user. Since advanced filtering is a prerequisite, check first if it is available and avoid the error message altogether. Fixes: ea7768b5bba8 ("net/enic: add flow implementation based on Flow Manager API") Cc: stable ---- Re: dpdk|ERR|Invalid value for nb_tx_desc(=2048), should be: <= 256, >= 64, and a product of 32 Seems to come from: https://github.com/DPDK/dpdk/blob/bc1db4f45af35c87e7d97db7a24d479674aa8a43/lib/ethdev/rte_ethdev.c#L1960 We need to find out why it is 2048. fbl Regarding "dpdk|ERR|PMD: rte_enic_pmd: Devcmd 88 failed with error code -1" The fix for this was backported upstream to DPDK 20.11.4. https://git.dpdk.org/dpdk-stable/commit/?h=20.11&id=4c5c31b120f1113de28ed53eff31cfd5651e8c94 For rhel8/2.15 the fix is available in FDP 22.I with openvswitch2.15-2.15.0-122.el8fdp which contains DPDK v20.11.6. For rhel9/2.17 the fix was in from the initial version of OVS 2.17 so should not appear with that. ---- For 2048 issue. That is the default tx queue size in OVS since OVS 2.7. Something must have changed in the enic driver/firmware that reduced what is allowed or it has only started checking for a too large value in the driver recently. A possible fix would be to check the max size from the driver in OVS and limit what is configured based on this. For now a workaround for enic would be to set the tx queue size to a suitable value when adding the enic port. e.g. $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=<enic pci> -- set Interface dpdk0 options:n_txq_desc=256 Hi Kevin, Yes, the "Devcmd 88 failed with error code -1" issue is gone. And, the desc workaround works. See log below. Thanks! Jean [root@netqe37 jhsiao]# ovs-vsctl show f4711e8a-cbea-4112-bf6b-a585bb878ddc Bridge ovsbr0 datapath_type: netdev Port dpdk-10 Interface dpdk-10 type: dpdk options: {dpdk-devargs="0000:1d:00.0", n_rxq="1", n_rxq_desc="256", n_txq_desc="256"} Port dpdk-11 Interface dpdk-11 type: dpdk options: {dpdk-devargs="0000:1d:00.2", n_rxq="1", n_rxq_desc="256", n_txq_desc="256"} Port ovsbr0 Interface ovsbr0 type: internal ovs_version: "2.15.8" [root@netqe37 jhsiao]# uname -r 4.18.0-372.41.1.el8_6.x86_64 [root@netqe37 jhsiao]# rpm -q openvswitch2.15 openvswitch2.15-2.15.0-133.el8fdp.x86_64 [root@netqe37 jhsiao]# Patch [1] sent to adjust number of descriptors based on device limits where required: netdev-dpdk: Check rx/tx descriptor sizes for device. By default OVS configures 2048 descriptors for tx and rx queues on DPDK devices. It also allows the user to configure those values. If the values used are not acceptable to the device than queue setup will fail. The device exposes it's max/min/alignment requirements, so use those to ensure that an acceptable value is used during queue setup. If the default or user value is not acceptable, adjust to a suitable value. [1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-February/401935.html There are some changes [0] on DPDK main branch that might be needed to get the correct max value for the devices. I have asked upstream maintainers to comment [1]. Can you paste the error logs you see for Rx so we can show upstream maintainers? Thanks. [0] commit 22572e84fbda2c195707ffbb0dd6af4433d7a219 Author: John Daley <johndale> Date: Fri Jan 28 09:58:13 2022 -0800 net/enic: support max descriptors allowed by adapter [1] https://bugs.dpdk.org/show_bug.cgi?id=1185 If this is a new Cisco 25GB adapter, it probably needs to be configured to increase the number of descriptors and the number descriptor queues. Please see https://doc.dpdk.org/guides/nics/enic.html#configuration-information. You will need to use the Cisco UCS manager to change the configuration (either UCSM or CIMC depending on if you are using a blade or a rack server). You can overprovision the adapter with more queues and descriptors than you think you will need. e.g.: RQs: 64 WQs: 32 CQs: 64 Interrupts: 33 RQ descriptors: 4096 WQ descriptrors: 4096 You should also enable advanced filters. There will be a radio button to enable it. After making the configuration changes you will need to reboot the host. Submitted v4 patch upstream https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404590.html Patch to check device for descriptor limits has merged upstream [0] and will be part of OVS 3.2. This should prevent queue setup failures on the NIC (or any NIC) because they are configured with a max tx/rx queue size of <2048. It may still be the case that NICs configured with low max queue sizes are more susceptible to dropping packets because of lack of buffering and the NIC will have to configured to have a larger max. [0] https://github.com/openvswitch/ovs/commit/9dad8dfd1ed9e1c4629b584b477114e11f3556b7 netdev-dpdk: Check rx/tx descriptor sizes for device. By default OVS configures 2048 descriptors for tx and rx queues on DPDK devices. It also allows the user to configure those values. If the values used are not acceptable to the device then queue setup would fail. The device exposes it's max/min/alignment requirements and OVS applies some limits also. Use these to ensure an acceptable value is used for the number of descriptors on a device tx/rx. If the default or user value is not acceptable, adjust to a suitable value and log. Reported-at: https://bugzilla.redhat.com/2119876 Reviewed-by: David Marchand <david.marchand> Reviewed-by: Simon Horman <simon.horman> Signed-off-by: Kevin Traynor <ktraynor> Signed-off-by: Ilya Maximets <i.maximets> Confirmed from Enic maintainer in comment #9 that Cisco 25GB adapter may have default max queue sizes of less than the default value that OVS attempts to configure. To increase the NIC max size values Cisco UCS manager must be used. In order to accommodate NICs that have a lower max queue size or other requirements. OVS now reads the descriptor size requirements from the NIC and adapts the value it configures based on this. This merged as part of OVS 3.2. |