Bug 2022053

Summary: dpdk application with vhost-net is not able to start
Product: OpenShift Container Platform Reporter: Sebastian Scheinkman <sscheink>
Component: NetworkingAssignee: Sebastian Scheinkman <sscheink>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: cgoncalves, dosmith, ealcaniz, fbaudin
Version: 4.10Keywords: Triaged
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2027672 (view as bug list) Environment:
Last Closed: 2022-03-10 16:26:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2027672    

Description Sebastian Scheinkman 2021-11-10 16:18:27 UTC
Description of problem:

On this BZ[https://bugzilla.redhat.com/show_bug.cgi?id=1983964] we fix the issue of not mounting the vhost-net inside the container when requested by the use.

The problem is to use the vhost-net on a dpdk application there is a need for a tap device to be created inside the pod.

To  be able to create the tap device we need
1. the sriov-device-plugin should mount the /dev/net/tun device inside the container together with the vhost-net device.
2. the sriov-config-daemon should load the tun kernel module when a user requests to apply a policy requesting vhost-net device.


The current solution is the allow pods to have MKNOD capability so they can create the tun device inside the container but this exposes multiple security issues.

Comment 2 zhaozhanqi 2021-12-01 07:11:52 UTC
Verified this bug on 4.10.0-202111292203 with the following steps

1. Create VF with device type is vfio-pci and needVhostNet is true

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  needVhostNet: true
  mtu: 1700
  nicSelector:
    deviceID: "158b"
    pfNames:
      - ens1f1
    rootDevices:
      - '0000:3b:00.1'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 2
  priority: 99
  resourceName: inteldpdk

2. Create sriovnetwork to generate NAD

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: z1
  ipam: "{}"
  vlan: 0
  resourceName: inteldpdk

3. Create the test pod

apiVersion: v1
kind: Pod
metadata:
  generateName: testpod1
  labels:
    env: test
  annotations:
    k8s.v1.cni.cncf.io/networks: dpdk-network
spec:
  containers:
  - name: dpdk
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.8.0-8.1628601733
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK"]
    resources:
      requests:
        hugepages-1Gi: 4Gi
        cpu: "4"
        memory: "1Gi"
      limits:
        hugepages-1Gi: 4Gi
        cpu: "4"
        memory: "1Gi"
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

4. Rsh into the container and check 

sh-4.4# ls /dev/net/tun  
/dev/net/tun


sh-4.4#  dpdk-testpmd -l 2,4,6,8 -a 0000:3b:0a.0 --iova-mode=va -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_i40e_vf (8086:154c) device: 0000:3b:0a.0 (socket 0)
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
Set mac packet forwarding mode
testpmd: create a new mbuf pool <mb_pool_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)

Port 0: link state change event

Port 0: link state change event

Port 0: link state change event
Port 0: 76:3E:4F:FC:EA:11
Checking link statuses...
Done
testpmd> start
mac packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 4 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  mac packet forwarding packets/burst=32
  nb forwarding cores=2 - nb forwarding ports=1
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x0
    RX queue: 0
      RX desc=512 - RX free threshold=32
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=512 - TX free threshold=32
      TX threshold registers: pthresh=32 hthresh=0  wthresh=0
      TX offloads=0x0 - TX RS bit threshold=32
testpmd> 
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 10             RX-dropped: 0             RX-total: 10
  TX-packets: 10             TX-dropped: 0             TX-total: 10
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 10             RX-dropped: 0             RX-total: 10
  TX-packets: 10             TX-dropped: 0             TX-total: 10
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

Comment 5 errata-xmlrpc 2022-03-10 16:26:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056