Bug 2022053 - dpdk application with vhost-net is not able to start
Summary: dpdk application with vhost-net is not able to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Sebastian Scheinkman
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 2027672
TreeView+ depends on / blocked
 
Reported: 2021-11-10 16:18 UTC by Sebastian Scheinkman
Modified: 2022-03-10 16:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2027672 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:26:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github k8snetworkplumbingwg sriov-network-device-plugin pull 394 0 None Merged Add the tun mounting with vhost is requested 2021-11-22 08:59:17 UTC
Github k8snetworkplumbingwg sriov-network-operator pull 205 0 None Merged Load tun kernel module 2021-11-29 15:15:29 UTC
Github openshift/sriov-network-device-plugin/commit/01c4f9ef419b82c6bbc389d2005748b025254769 0 None None None 2021-11-22 08:59:17 UTC
Github openshift sriov-network-device-plugin pull 49 0 None Merged 4.10 update 2021-11-18 2021-11-30 13:56:30 UTC
Github openshift sriov-network-operator pull 592 0 None Merged Bug 2022053: Sync 28 11 21 2021-11-30 13:55:39 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:27:16 UTC

Description Sebastian Scheinkman 2021-11-10 16:18:27 UTC
Description of problem:

On this BZ[https://bugzilla.redhat.com/show_bug.cgi?id=1983964] we fix the issue of not mounting the vhost-net inside the container when requested by the use.

The problem is to use the vhost-net on a dpdk application there is a need for a tap device to be created inside the pod.

To  be able to create the tap device we need
1. the sriov-device-plugin should mount the /dev/net/tun device inside the container together with the vhost-net device.
2. the sriov-config-daemon should load the tun kernel module when a user requests to apply a policy requesting vhost-net device.


The current solution is the allow pods to have MKNOD capability so they can create the tun device inside the container but this exposes multiple security issues.

Comment 2 zhaozhanqi 2021-12-01 07:11:52 UTC
Verified this bug on 4.10.0-202111292203 with the following steps

1. Create VF with device type is vfio-pci and needVhostNet is true

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  needVhostNet: true
  mtu: 1700
  nicSelector:
    deviceID: "158b"
    pfNames:
      - ens1f1
    rootDevices:
      - '0000:3b:00.1'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 2
  priority: 99
  resourceName: inteldpdk

2. Create sriovnetwork to generate NAD

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: z1
  ipam: "{}"
  vlan: 0
  resourceName: inteldpdk

3. Create the test pod

apiVersion: v1
kind: Pod
metadata:
  generateName: testpod1
  labels:
    env: test
  annotations:
    k8s.v1.cni.cncf.io/networks: dpdk-network
spec:
  containers:
  - name: dpdk
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.8.0-8.1628601733
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK"]
    resources:
      requests:
        hugepages-1Gi: 4Gi
        cpu: "4"
        memory: "1Gi"
      limits:
        hugepages-1Gi: 4Gi
        cpu: "4"
        memory: "1Gi"
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

4. Rsh into the container and check 

sh-4.4# ls /dev/net/tun  
/dev/net/tun


sh-4.4#  dpdk-testpmd -l 2,4,6,8 -a 0000:3b:0a.0 --iova-mode=va -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_i40e_vf (8086:154c) device: 0000:3b:0a.0 (socket 0)
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
Set mac packet forwarding mode
testpmd: create a new mbuf pool <mb_pool_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)

Port 0: link state change event

Port 0: link state change event

Port 0: link state change event
Port 0: 76:3E:4F:FC:EA:11
Checking link statuses...
Done
testpmd> start
mac packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 4 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  mac packet forwarding packets/burst=32
  nb forwarding cores=2 - nb forwarding ports=1
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0
    RX queue: 0
      RX desc=512 - RX free threshold=32
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=512 - TX free threshold=32
      TX threshold registers: pthresh=32 hthresh=0  wthresh=0
      TX offloads=0x0 - TX RS bit threshold=32
testpmd> 
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 10             RX-dropped: 0             RX-total: 10
  TX-packets: 10             TX-dropped: 0             TX-total: 10
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 10             RX-dropped: 0             RX-total: 10
  TX-packets: 10             TX-dropped: 0             TX-total: 10
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

Comment 5 errata-xmlrpc 2022-03-10 16:26:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.