Bug 1566712
Summary: | Got nfpu_open failed ERR when starting OVS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jean-Tsung Hsiao <jhsiao> |
Component: | openvswitch | Assignee: | Aaron Conole <aconole> |
Status: | CLOSED ERRATA | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.5 | CC: | aconole, atragler, ctrautma, echaudro, jhsiao, kzhang, louis.peens, pablo.cascon, qding, rkhan, simon.horman, tredaelli |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-03 14:37:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1560628 |
Description
Jean-Tsung Hsiao
2018-04-12 20:23:49 UTC
Most likely a permissions issue with: nspv_aquire_process_lock() -> ... -> snprintf(lockname, sizeof(lockname), "/var/lock/nfp%d", desc->nfp); -> ... -> desc->lock = open(lockname, O_RDWR | O_CREAT, 0666); /var/lock is owned by root:root Perhaps we need a /var/lock/nfp/... directory which can be accessed by the ovs process and be where the nfp locks are stored. I have two patches to DPDK to test out. Will provide a private build tomorrow. Patches posted as RFC: http://dpdk.org/ml/archives/dev/2018-April/096911.html what is an NFP card? (In reply to Rashid Khan from comment #5) > what is an NFP card? Ethernet controller: Netronome Systems, Inc. Device 4000 Testpmd also failed --- missing firmware. See log below. Hi Aaron, Where to get the file: Firmware file /lib/firmware/netronome/nic_dpdk_default.nffw ? Thanks! Jean EAL: Detected 16 lcore(s) EAL: Auto-detected process type: PRIMARY EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:81:00.0 on NUMA socket 1 EAL: probe driver: 19ee:4000 net_nfp_pf EAL: using IOMMU type 1 (Type 1) PMD: nfp_pf_pci_probe(): nspu ABI version: 0.21 PMD: No firmware detected. Resetting NFP... PMD: Reset done. PMD: Uploading firmware... PMD: Firmware file /lib/firmware/netronome/nic_dpdk_default.nffw not found.PMD: nfp fw upload failed EAL: Releasing pci mapped resource for 0000:81:00.0 EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7fb340000000 EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7fb348000000 EAL: Calling pci_unmap_resource for 0000:81:00.0 at 0x7fb34c000000 EAL: Requested device 0000:81:00.0 cannot be used EAL: No probed ethernet devices Interactive-mode selected USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=179456, size=2176, socket=0 USER1: create a new mbuf pool <mbuf_pool_socket_1>: n=179456, size=2176, socket=1 Done testpmd> To create the symbolic link please check which FW the netdev driver 'nfp' would load by running "dmesg | grep AMDA" . (normally the 'nfp' driver would have been loaded but might need to be manually loaded/binded to the device). Example: [ 3468.049908] nfp 0000:04:00.0: nfp: netronome/nic_AMDA0099-0001_2x25.nffw: found, loading... cd /lib/firmware/netronome ln -s nic_AMDA0099-0001_2x25.nffw nic_dpdk_default.nffw (the FW that the NFP PMD is requesting is a symbolic link to the right FW. Later releases of DPDK - from 18.05 - don't need this symbolic link, backport?) (In reply to Pablo Cascon from comment #8) > To create the symbolic link please check which FW the netdev driver 'nfp' > would load by running "dmesg | grep AMDA" . (normally the 'nfp' driver would > have been loaded but might need to be manually loaded/binded to the device). > > Example: > > [ 3468.049908] nfp 0000:04:00.0: nfp: netronome/nic_AMDA0099-0001_2x25.nffw: > found, loading... > > cd /lib/firmware/netronome > ln -s nic_AMDA0099-0001_2x25.nffw nic_dpdk_default.nffw > > (the FW that the NFP PMD is requesting is a symbolic link to the right FW. > Later releases of DPDK - from 18.05 - don't need this symbolic link, > backport?) The suggestion works. NOTE: For testpmd to work properly, need to add "--disable-hw-vlan". Hi, I have some problems to find the commit that will be in 18.05 in order to avoid doing the symlink. I tried to use the last dpdk git (fb73e096110a41b77448fe27fd9be8c489ec5d82) version, but I still have the same problem if I don't create the symlink Thank you I found that the needed commits were merged on Saturday. I'm backporting it on 17.11 The issue has been fixed with openvswitch-2.9.0-17.el7fdp.x86_64. Will do further testing. [root@netqe7 ~]# tail -f /var/log/openvswitch/ovs-vswitchd.log 2018-04-17T16:41:53.232Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2018-04-17T16:41:53.236Z|00002|ovs_numa|INFO|Discovered 8 CPU cores on NUMA node 0 2018-04-17T16:41:53.236Z|00003|ovs_numa|INFO|Discovered 8 CPU cores on NUMA node 1 2018-04-17T16:41:53.236Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 16 CPU cores 2018-04-17T16:41:53.236Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2018-04-17T16:41:53.237Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2018-04-17T16:41:53.238Z|00007|dpdk|INFO|Using DPDK 17.11.0 2018-04-17T16:41:53.238Z|00008|dpdk|INFO|DPDK Enabled - initializing... 2018-04-17T16:41:53.238Z|00009|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2018-04-17T16:41:53.238Z|00010|dpdk|INFO|IOMMU support for vhost-user-client disabled. 2018-04-17T16:41:53.238Z|00011|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x0808 --socket-mem 4096,1 2018-04-17T16:41:53.238Z|00012|dpdk|INFO|EAL: Detected 16 lcore(s) 2018-04-17T16:41:53.262Z|00013|dpdk|INFO|EAL: Probing VFIO support... 2018-04-17T16:41:53.262Z|00014|dpdk|INFO|EAL: VFIO support initialized 2018-04-17T16:41:59.884Z|00015|dpdk|INFO|EAL: PCI device 0000:03:00.0 on NUMA socket 0 2018-04-17T16:41:59.884Z|00016|dpdk|INFO|EAL: probe driver: 15b3:1007 net_mlx4 2018-04-17T16:41:59.886Z|00017|dpdk|INFO|PMD: net_mlx4: PCI information matches, using device "mlx4_0" (VF: false) 2018-04-17T16:41:59.886Z|00018|dpdk|INFO|PMD: net_mlx4: 2 port(s) detected 2018-04-17T16:41:59.887Z|00019|dpdk|INFO|PMD: net_mlx4: port 1 MAC address is e4:1d:2d:79:b5:51 2018-04-17T16:41:59.888Z|00020|dpdk|INFO|PMD: net_mlx4: port 2 MAC address is e4:1d:2d:79:b5:52 2018-04-17T16:41:59.889Z|00021|dpdk|INFO|EAL: PCI device 0000:81:00.0 on NUMA socket 1 2018-04-17T16:41:59.889Z|00022|dpdk|INFO|EAL: probe driver: 19ee:4000 net_nfp_pf 2018-04-17T16:41:59.890Z|00023|dpdk|INFO|EAL: using IOMMU type 1 (Type 1) 2018-04-17T16:42:00.214Z|00024|dpdk|INFO|PMD: firmware model found: AMDA0099-0001 2018-04-17T16:42:00.214Z|00025|dpdk|INFO|PMD: NFP ethernet port table reports 2 ports 2018-04-17T16:42:00.214Z|00026|dpdk|INFO|PMD: Port speed: 25000 2018-04-17T16:42:02.924Z|00027|dpdk|INFO|PMD: Firmware file found at /lib/firmware/netronome/nic_AMDA0099-0001_2x25.nffw with size: 1386368 2018-04-17T16:42:02.925Z|00028|dpdk|INFO|PMD: Uploading the firmware ... 2018-04-17T16:42:08.294Z|00029|dpdk|INFO|PMD: Done 2018-04-17T16:42:08.297Z|00030|dpdk|INFO|PMD: nfp_pf_pci_probe(): Total pf ports: 2 2018-04-17T16:42:08.297Z|00031|dpdk|INFO|PMD: nfp_net_init(): VER: 0x305, Maximum supported MTU: 9216 2018-04-17T16:42:08.297Z|00032|dpdk|INFO|PMD: nfp_net_init(): CAP: 0x78160233, PROMISC RXCSUM TXCSUM GATHER RSS 2018-04-17T16:42:08.297Z|00033|dpdk|INFO|PMD: nfp_net_init(): max_rx_queues: 32, max_tx_queues: 32 2018-04-17T16:42:08.423Z|00034|dpdk|INFO|PMD: nfp_net_init(): port 2 VendorID=0x19ee DeviceID=0x4000 mac=00:15:4d:13:07:8a 2018-04-17T16:42:08.423Z|00035|dpdk|INFO|PMD: nfp_net_init(): VER: 0x305, Maximum supported MTU: 9216 2018-04-17T16:42:08.423Z|00036|dpdk|INFO|PMD: nfp_net_init(): CAP: 0x78160233, PROMISC RXCSUM TXCSUM GATHER RSS 2018-04-17T16:42:08.423Z|00037|dpdk|INFO|PMD: nfp_net_init(): max_rx_queues: 31, max_tx_queues: 31 2018-04-17T16:42:08.524Z|00038|dpdk|INFO|PMD: nfp_net_init(): port 3 VendorID=0x19ee DeviceID=0x4000 mac=00:15:4d:13:07:8b 2018-04-17T16:42:08.524Z|00039|dpdk|INFO|EAL: PCI device 0000:83:00.0 on NUMA socket 1 2018-04-17T16:42:08.524Z|00040|dpdk|INFO|EAL: probe driver: 8086:154d net_ixgbe 2018-04-17T16:42:08.524Z|00041|dpdk|INFO|EAL: PCI device 0000:83:00.1 on NUMA socket 1 2018-04-17T16:42:08.524Z|00042|dpdk|INFO|EAL: probe driver: 8086:154d net_ixgbe 2018-04-17T16:42:08.525Z|00043|dpdk|INFO|DPDK Enabled - initialized 2018-04-17T16:42:08.528Z|00044|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.9.0 2018-04-17T16:42:08.528Z|00045|timeval|WARN|Unreasonably long 15290ms poll interval (40ms user, 6723ms system) 2018-04-17T16:42:08.528Z|00046|timeval|WARN|faults: 22578 minor, 0 major 2018-04-17T16:42:08.528Z|00047|timeval|WARN|context switches: 345 voluntary, 11 involuntary 2018-04-17T16:42:08.528Z|00048|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=aaa85002: 2018-04-17T16:42:08.528Z|00049|coverage|INFO|bridge_reconfigure 0.2/sec 0.017/sec 0.0003/sec total: 1 2018-04-17T16:42:08.528Z|00050|coverage|INFO|cmap_expand 0.8/sec 0.133/sec 0.0022/sec total: 8 2018-04-17T16:42:08.528Z|00051|coverage|INFO|miniflow_malloc 1.0/sec 0.183/sec 0.0031/sec total: 11 2018-04-17T16:42:08.528Z|00052|coverage|INFO|hmap_expand 25.4/sec 6.317/sec 0.1053/sec total: 379 2018-04-17T16:42:08.528Z|00053|coverage|INFO|txn_unchanged 0.2/sec 0.050/sec 0.0008/sec total: 3 2018-04-17T16:42:08.528Z|00054|coverage|INFO|poll_create_node 2.8/sec 0.667/sec 0.0111/sec total: 40 2018-04-17T16:42:08.528Z|00055|coverage|INFO|seq_change 4.8/sec 1.167/sec 0.0194/sec total: 70 2018-04-17T16:42:08.528Z|00056|coverage|INFO|pstream_open 0.2/sec 0.017/sec 0.0003/sec total: 1 2018-04-17T16:42:08.528Z|00057|coverage|INFO|stream_open 0.2/sec 0.017/sec 0.0003/sec total: 1 2018-04-17T16:42:08.528Z|00058|coverage|INFO|util_xalloc 495.2/sec 123.767/sec 2.0628/sec total: 7426 2018-04-17T16:42:08.528Z|00059|coverage|INFO|netdev_get_hwaddr 0.4/sec 0.033/sec 0.0006/sec total: 2 2018-04-17T16:42:08.528Z|00060|coverage|INFO|netlink_received 0.2/sec 0.050/sec 0.0008/sec total: 3 2018-04-17T16:42:08.528Z|00061|coverage|INFO|netlink_sent 0.2/sec 0.017/sec 0.0003/sec total: 1 2018-04-17T16:42:08.528Z|00062|coverage|INFO|91 events never hit Got the same issue when adding nfp dpdk interface to OVS-dpdk bridge: 2018-04-17T17:34:25.399Z|00051|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation 2018-04-17T17:34:25.399Z|00052|ofproto_dpif|INFO|netdev@ovs-netdev: VLAN header stack length probed as 1 2018-04-17T17:34:25.399Z|00053|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3 2018-04-17T17:34:25.399Z|00054|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports truncate action 2018-04-17T17:34:25.399Z|00055|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids 2018-04-17T17:34:25.399Z|00056|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports clone action 2018-04-17T17:34:25.399Z|00057|ofproto_dpif|INFO|netdev@ovs-netdev: Max sample nesting level probed as 10 2018-04-17T17:34:25.399Z|00058|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports eventmask in conntrack action 2018-04-17T17:34:25.399Z|00059|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_clear action 2018-04-17T17:34:25.399Z|00060|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_state 2018-04-17T17:34:25.399Z|00061|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_zone 2018-04-17T17:34:25.399Z|00062|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_mark 2018-04-17T17:34:25.399Z|00063|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_label 2018-04-17T17:34:25.399Z|00064|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_state_nat 2018-04-17T17:34:25.399Z|00065|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_orig_tuple 2018-04-17T17:34:25.399Z|00066|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_orig_tuple6 2018-04-17T17:34:25.429Z|00067|bridge|INFO|bridge ovsbr0: added interface ovsbr0 on port 65534 2018-04-17T17:34:25.433Z|00068|bridge|INFO|bridge ovsbr0: using datapath ID 000006269728d840 2018-04-17T17:34:25.433Z|00069|connmgr|INFO|ovsbr0: added service controller "punix:/var/run/openvswitch/ovsbr0.mgmt" 2018-04-17T17:34:25.487Z|00070|dpdk|INFO|EAL: PCI device 0000:81:00.0 on NUMA socket 1 2018-04-17T17:34:25.487Z|00071|dpdk|INFO|EAL: probe driver: 19ee:4000 net_nfp_pf 2018-04-17T17:34:25.488Z|00072|dpdk|INFO|EAL: using IOMMU type 1 (Type 1) 2018-04-17T17:34:25.709Z|00073|dpdk|ERR|PMD: nfpu_open failed Hi Tim and Eelco, Sorry, it was a false alarm. In the process of yum remove and yum install, I got -15 back instead of -17 installed. Thanks Tim for reminding me. Next, to run ping/netperf traffic between guests over OVS-dpdk vxlan tunnel. This is another critical test --- bnxt failed on that. Thanks! Jean NFP passed the following test case --- netperf between guests over OVS-dpdk/NFP vxlan tunnel: [root@localhost ~]# netperf -H 172.16.33.106 -l 60; netperf -H 172.16.33.106 -t TCP_MAERTS -l 60 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 3878.57 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 3801.05 [root@localhost ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1267 |