RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1566712 - Got nfpu_open failed ERR when starting OVS
Summary: Got nfpu_open failed ERR when starting OVS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.5
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Aaron Conole
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Depends On:
Blocks: 1560628
TreeView+ depends on / blocked
 
Reported: 2018-04-12 20:23 UTC by Jean-Tsung Hsiao
Modified: 2018-05-03 14:37 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-03 14:37:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1568301 1 None None None 2021-01-20 06:05:38 UTC

Internal Links: 1568301

Description Jean-Tsung Hsiao 2018-04-12 20:23:49 UTC
Description of problem: Got nfpu_open failed ERR when starting OVS

2018-04-12T19:49:09.317Z|00024|dpdk|ERR|PMD: nfpu_open failed
2018-04-12T19:49:09.317Z|00025|dpdk|INFO|EAL: Releasing pci mapped resource for 0000:81:00.0
2018-04-12T19:49:09.317Z|00026|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7ff740000000
2018-04-12T19:49:09.317Z|00027|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7ff748000000
2018-04-12T19:49:09.317Z|00028|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7ff74c000000
2018-04-12T19:49:09.455Z|00029|dpdk|ERR|EAL: Requested device 0000:81:00.0 cannot be used


Version-Release number of selected component (if applicable):
[root@netqe7 openvswitch-2.9.0]# rpm -q openvswitch
openvswitch-2.9.0-15.el7fdp.x86_64
[root@netqe7 openvswitch-2.9.0]# uname -a
Linux netqe7.knqe.lab.eng.bos.redhat.com 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible: Reproducible


Steps to Reproduce:
1. Install NFP card on a server
2. Provision the server under 7.5
3. Install openvswitch-2.9.0-15.el7fdp
4. Bind NFP
5. Start OVS

Actual results:


Expected results:


Additional info:

Comment 2 Aaron Conole 2018-04-12 21:31:55 UTC
Most likely a permissions issue with:

nspv_aquire_process_lock()
  -> ...
  -> snprintf(lockname, sizeof(lockname), "/var/lock/nfp%d", desc->nfp);
  -> ...
  -> desc->lock = open(lockname, O_RDWR | O_CREAT, 0666);

/var/lock is owned by root:root

Perhaps we need a /var/lock/nfp/... directory which can be accessed by the ovs process and be where the nfp locks are stored.

Comment 3 Aaron Conole 2018-04-12 22:02:26 UTC
I have two patches to DPDK to test out.  Will provide a private build tomorrow.

Comment 4 Aaron Conole 2018-04-12 22:22:48 UTC
Patches posted as RFC:
http://dpdk.org/ml/archives/dev/2018-April/096911.html

Comment 5 Rashid Khan 2018-04-13 11:19:21 UTC
what is an NFP card?

Comment 6 Jean-Tsung Hsiao 2018-04-13 11:59:28 UTC
(In reply to Rashid Khan from comment #5)
> what is an NFP card?

Ethernet controller: Netronome Systems, Inc. Device 4000

Comment 7 Jean-Tsung Hsiao 2018-04-13 12:31:34 UTC
Testpmd also failed --- missing firmware. See log below.

Hi Aaron,

Where to get the file: Firmware file /lib/firmware/netronome/nic_dpdk_default.nffw ?

Thanks!

Jean

EAL: Detected 16 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 19ee:4000 net_nfp_pf
EAL:   using IOMMU type 1 (Type 1)
PMD: nfp_pf_pci_probe(): nspu ABI version: 0.21

PMD: No firmware detected. Resetting NFP...
PMD: Reset done.
PMD: Uploading firmware...
PMD: Firmware file /lib/firmware/netronome/nic_dpdk_default.nffw not found.PMD: nfp fw upload failed
EAL: Releasing pci mapped resource for 0000:81:00.0
EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7fb340000000
EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7fb348000000
EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7fb34c000000
EAL: Requested device 0000:81:00.0 cannot be used
EAL: No probed ethernet devices
Interactive-mode selected
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=179456, size=2176, socket=0
USER1: create a new mbuf pool <mbuf_pool_socket_1>: n=179456, size=2176, socket=1
Done
testpmd>

Comment 8 Pablo Cascon (Netronome) 2018-04-13 15:27:35 UTC
To create the symbolic link please check which FW the netdev driver 'nfp' would load by running "dmesg | grep AMDA" . (normally the 'nfp' driver would have been loaded but might need to be manually loaded/binded to the device).

Example:

[ 3468.049908] nfp 0000:04:00.0: nfp: netronome/nic_AMDA0099-0001_2x25.nffw: found, loading...

cd /lib/firmware/netronome
ln -s nic_AMDA0099-0001_2x25.nffw nic_dpdk_default.nffw

(the FW that the NFP PMD is requesting is a symbolic link to the right FW. Later releases of DPDK - from 18.05 - don't need this symbolic link, backport?)

Comment 9 Jean-Tsung Hsiao 2018-04-13 17:51:21 UTC
(In reply to Pablo Cascon from comment #8)
> To create the symbolic link please check which FW the netdev driver 'nfp'
> would load by running "dmesg | grep AMDA" . (normally the 'nfp' driver would
> have been loaded but might need to be manually loaded/binded to the device).
> 
> Example:
> 
> [ 3468.049908] nfp 0000:04:00.0: nfp: netronome/nic_AMDA0099-0001_2x25.nffw:
> found, loading...
> 
> cd /lib/firmware/netronome
> ln -s nic_AMDA0099-0001_2x25.nffw nic_dpdk_default.nffw
> 
> (the FW that the NFP PMD is requesting is a symbolic link to the right FW.
> Later releases of DPDK - from 18.05 - don't need this symbolic link,
> backport?)

The suggestion works.

NOTE: For testpmd to work properly, need to add "--disable-hw-vlan".

Comment 10 Timothy Redaelli 2018-04-13 18:12:24 UTC
Hi,
I have some problems to find the commit that will be in 18.05 in order to avoid doing the symlink.

I tried to use the last dpdk git (fb73e096110a41b77448fe27fd9be8c489ec5d82) version, but I still have the same problem if I don't create the symlink

Thank you

Comment 11 Timothy Redaelli 2018-04-16 09:41:40 UTC
I found that the needed commits were merged on Saturday.
I'm backporting it on 17.11

Comment 14 Jean-Tsung Hsiao 2018-04-17 17:05:51 UTC
The issue has been fixed with openvswitch-2.9.0-17.el7fdp.x86_64. Will do further testing.

[root@netqe7 ~]# tail -f /var/log/openvswitch/ovs-vswitchd.log
2018-04-17T16:41:53.232Z|00001|vlog|INFO|opened log file
/var/log/openvswitch/ovs-vswitchd.log
2018-04-17T16:41:53.236Z|00002|ovs_numa|INFO|Discovered 8 CPU cores on
NUMA node 0
2018-04-17T16:41:53.236Z|00003|ovs_numa|INFO|Discovered 8 CPU cores on
NUMA node 1
2018-04-17T16:41:53.236Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and
16 CPU cores
2018-04-17T16:41:53.236Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
connecting...
2018-04-17T16:41:53.237Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
connected
2018-04-17T16:41:53.238Z|00007|dpdk|INFO|Using DPDK 17.11.0
2018-04-17T16:41:53.238Z|00008|dpdk|INFO|DPDK Enabled - initializing...
2018-04-17T16:41:53.238Z|00009|dpdk|INFO|No vhost-sock-dir provided -
defaulting to /var/run/openvswitch
2018-04-17T16:41:53.238Z|00010|dpdk|INFO|IOMMU support for
vhost-user-client disabled.
2018-04-17T16:41:53.238Z|00011|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
0x0808 --socket-mem 4096,1
2018-04-17T16:41:53.238Z|00012|dpdk|INFO|EAL: Detected 16 lcore(s)
2018-04-17T16:41:53.262Z|00013|dpdk|INFO|EAL: Probing VFIO support...
2018-04-17T16:41:53.262Z|00014|dpdk|INFO|EAL: VFIO support initialized
2018-04-17T16:41:59.884Z|00015|dpdk|INFO|EAL: PCI device 0000:03:00.0 on
NUMA socket 0
2018-04-17T16:41:59.884Z|00016|dpdk|INFO|EAL:   probe driver: 15b3:1007
net_mlx4
2018-04-17T16:41:59.886Z|00017|dpdk|INFO|PMD: net_mlx4: PCI information
matches, using device "mlx4_0" (VF: false)
2018-04-17T16:41:59.886Z|00018|dpdk|INFO|PMD: net_mlx4: 2 port(s) detected
2018-04-17T16:41:59.887Z|00019|dpdk|INFO|PMD: net_mlx4: port 1 MAC
address is e4:1d:2d:79:b5:51
2018-04-17T16:41:59.888Z|00020|dpdk|INFO|PMD: net_mlx4: port 2 MAC
address is e4:1d:2d:79:b5:52
2018-04-17T16:41:59.889Z|00021|dpdk|INFO|EAL: PCI device 0000:81:00.0 on
NUMA socket 1
2018-04-17T16:41:59.889Z|00022|dpdk|INFO|EAL:   probe driver: 19ee:4000
net_nfp_pf
2018-04-17T16:41:59.890Z|00023|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2018-04-17T16:42:00.214Z|00024|dpdk|INFO|PMD: firmware model found:
AMDA0099-0001
2018-04-17T16:42:00.214Z|00025|dpdk|INFO|PMD: NFP ethernet port table
reports 2 ports
2018-04-17T16:42:00.214Z|00026|dpdk|INFO|PMD: Port speed: 25000
2018-04-17T16:42:02.924Z|00027|dpdk|INFO|PMD: Firmware file found at
/lib/firmware/netronome/nic_AMDA0099-0001_2x25.nffw with size: 1386368
2018-04-17T16:42:02.925Z|00028|dpdk|INFO|PMD: Uploading the firmware ...
2018-04-17T16:42:08.294Z|00029|dpdk|INFO|PMD: Done
2018-04-17T16:42:08.297Z|00030|dpdk|INFO|PMD: nfp_pf_pci_probe(): Total
pf ports: 2
2018-04-17T16:42:08.297Z|00031|dpdk|INFO|PMD: nfp_net_init(): VER:
0x305, Maximum supported MTU: 9216
2018-04-17T16:42:08.297Z|00032|dpdk|INFO|PMD: nfp_net_init(): CAP:
0x78160233, PROMISC RXCSUM TXCSUM GATHER RSS
2018-04-17T16:42:08.297Z|00033|dpdk|INFO|PMD: nfp_net_init():
max_rx_queues: 32, max_tx_queues: 32
2018-04-17T16:42:08.423Z|00034|dpdk|INFO|PMD: nfp_net_init(): port 2
VendorID=0x19ee DeviceID=0x4000 mac=00:15:4d:13:07:8a
2018-04-17T16:42:08.423Z|00035|dpdk|INFO|PMD: nfp_net_init(): VER:
0x305, Maximum supported MTU: 9216
2018-04-17T16:42:08.423Z|00036|dpdk|INFO|PMD: nfp_net_init(): CAP:
0x78160233, PROMISC RXCSUM TXCSUM GATHER RSS
2018-04-17T16:42:08.423Z|00037|dpdk|INFO|PMD: nfp_net_init():
max_rx_queues: 31, max_tx_queues: 31
2018-04-17T16:42:08.524Z|00038|dpdk|INFO|PMD: nfp_net_init(): port 3
VendorID=0x19ee DeviceID=0x4000 mac=00:15:4d:13:07:8b
2018-04-17T16:42:08.524Z|00039|dpdk|INFO|EAL: PCI device 0000:83:00.0 on
NUMA socket 1
2018-04-17T16:42:08.524Z|00040|dpdk|INFO|EAL:   probe driver: 8086:154d
net_ixgbe
2018-04-17T16:42:08.524Z|00041|dpdk|INFO|EAL: PCI device 0000:83:00.1 on
NUMA socket 1
2018-04-17T16:42:08.524Z|00042|dpdk|INFO|EAL:   probe driver: 8086:154d
net_ixgbe
2018-04-17T16:42:08.525Z|00043|dpdk|INFO|DPDK Enabled - initialized
2018-04-17T16:42:08.528Z|00044|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.9.0
2018-04-17T16:42:08.528Z|00045|timeval|WARN|Unreasonably long 15290ms
poll interval (40ms user, 6723ms system)
2018-04-17T16:42:08.528Z|00046|timeval|WARN|faults: 22578 minor, 0 major
2018-04-17T16:42:08.528Z|00047|timeval|WARN|context switches: 345
voluntary, 11 involuntary
2018-04-17T16:42:08.528Z|00048|coverage|INFO|Event coverage, avg rate
over last: 5 seconds, last minute, last hour,  hash=aaa85002:
2018-04-17T16:42:08.528Z|00049|coverage|INFO|bridge_reconfigure        
0.2/sec     0.017/sec        0.0003/sec   total: 1
2018-04-17T16:42:08.528Z|00050|coverage|INFO|cmap_expand               
0.8/sec     0.133/sec        0.0022/sec   total: 8
2018-04-17T16:42:08.528Z|00051|coverage|INFO|miniflow_malloc           
1.0/sec     0.183/sec        0.0031/sec   total: 11
2018-04-17T16:42:08.528Z|00052|coverage|INFO|hmap_expand              
25.4/sec     6.317/sec        0.1053/sec   total: 379
2018-04-17T16:42:08.528Z|00053|coverage|INFO|txn_unchanged             
0.2/sec     0.050/sec        0.0008/sec   total: 3
2018-04-17T16:42:08.528Z|00054|coverage|INFO|poll_create_node          
2.8/sec     0.667/sec        0.0111/sec   total: 40
2018-04-17T16:42:08.528Z|00055|coverage|INFO|seq_change                
4.8/sec     1.167/sec        0.0194/sec   total: 70
2018-04-17T16:42:08.528Z|00056|coverage|INFO|pstream_open              
0.2/sec     0.017/sec        0.0003/sec   total: 1
2018-04-17T16:42:08.528Z|00057|coverage|INFO|stream_open               
0.2/sec     0.017/sec        0.0003/sec   total: 1
2018-04-17T16:42:08.528Z|00058|coverage|INFO|util_xalloc             
495.2/sec   123.767/sec        2.0628/sec   total: 7426
2018-04-17T16:42:08.528Z|00059|coverage|INFO|netdev_get_hwaddr         
0.4/sec     0.033/sec        0.0006/sec   total: 2
2018-04-17T16:42:08.528Z|00060|coverage|INFO|netlink_received          
0.2/sec     0.050/sec        0.0008/sec   total: 3
2018-04-17T16:42:08.528Z|00061|coverage|INFO|netlink_sent              
0.2/sec     0.017/sec        0.0003/sec   total: 1
2018-04-17T16:42:08.528Z|00062|coverage|INFO|91 events never hit

Comment 15 Jean-Tsung Hsiao 2018-04-17 17:36:50 UTC
Got the same issue when adding nfp dpdk interface to OVS-dpdk bridge:

2018-04-17T17:34:25.399Z|00051|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation
2018-04-17T17:34:25.399Z|00052|ofproto_dpif|INFO|netdev@ovs-netdev: VLAN header stack length probed as 1
2018-04-17T17:34:25.399Z|00053|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3
2018-04-17T17:34:25.399Z|00054|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports truncate action
2018-04-17T17:34:25.399Z|00055|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids
2018-04-17T17:34:25.399Z|00056|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports clone action
2018-04-17T17:34:25.399Z|00057|ofproto_dpif|INFO|netdev@ovs-netdev: Max sample nesting level probed as 10
2018-04-17T17:34:25.399Z|00058|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports eventmask in conntrack action
2018-04-17T17:34:25.399Z|00059|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_clear action
2018-04-17T17:34:25.399Z|00060|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_state
2018-04-17T17:34:25.399Z|00061|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_zone
2018-04-17T17:34:25.399Z|00062|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_mark
2018-04-17T17:34:25.399Z|00063|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_label
2018-04-17T17:34:25.399Z|00064|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_state_nat
2018-04-17T17:34:25.399Z|00065|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_orig_tuple
2018-04-17T17:34:25.399Z|00066|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_orig_tuple6
2018-04-17T17:34:25.429Z|00067|bridge|INFO|bridge ovsbr0: added interface ovsbr0 on port 65534
2018-04-17T17:34:25.433Z|00068|bridge|INFO|bridge ovsbr0: using datapath ID 000006269728d840
2018-04-17T17:34:25.433Z|00069|connmgr|INFO|ovsbr0: added service controller "punix:/var/run/openvswitch/ovsbr0.mgmt"
2018-04-17T17:34:25.487Z|00070|dpdk|INFO|EAL: PCI device 0000:81:00.0 on NUMA socket 1
2018-04-17T17:34:25.487Z|00071|dpdk|INFO|EAL:   probe driver: 19ee:4000 net_nfp_pf
2018-04-17T17:34:25.488Z|00072|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2018-04-17T17:34:25.709Z|00073|dpdk|ERR|PMD: nfpu_open failed

Comment 16 Jean-Tsung Hsiao 2018-04-17 19:33:16 UTC
Hi Tim and Eelco,

Sorry, it was a false alarm.

In the process of yum remove and yum install, I got -15 back instead of -17 installed.

Thanks Tim for reminding me.

Next, to run ping/netperf traffic between guests over OVS-dpdk vxlan tunnel. This is another critical test --- bnxt failed on that.

Thanks!

Jean

Comment 17 Jean-Tsung Hsiao 2018-04-18 04:19:42 UTC
NFP passed the following test case --- netperf between guests over OVS-dpdk/NFP vxlan tunnel:

[root@localhost ~]# netperf -H 172.16.33.106 -l 60; netperf -H 172.16.33.106 -t TCP_MAERTS -l 60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3878.57   
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3801.05   
[root@localhost ~]#

Comment 19 Timothy Redaelli 2018-05-03 14:37:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1267


Note You need to log in before you can comment on or make changes to this bug.