Bug 1826491

Summary: creating, deleting and recreating the same pod that has annotations for sriovnetwork net-attach and a vf using netdevice stops working
Product: OpenShift Container Platform Reporter: Nabeel Cocker <ncocker>
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: unspecified CC: zshi
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-22 03:58:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nabeel Cocker 2020-04-21 19:16:23 UTC
Description of problem:

Setup uses MLX sriov nics.  Pods are attached to a vf via net-attachments.  The first time the pod is created, it connects, has the net1 and can ping/send traffic.  If the pod is deleted and then recreated.  It attaches to the same net-att, but now it cannot ping or send traffic.  



Version-Release number of selected component (if applicable):

4.3.5

How reproducible:
always

Steps to Reproduce:
1. create pod with the annotation
2.delete the pod
3.create the pod again

Actual results:
[corona@bastion f5]$ cat pod-with-vf-4.yaml 
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: nad-lb-ext-inf
  name: busybox-4
spec:
  containers:
  - image: wsregistry.vici.verizon.com:5000/corona/busybox:1.28
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always
[corona@bastion f5]$ oc create -f pod-with-vf-4.yaml
pod/busybox-4 created
[corona@bastion f5]$ oc get pods
NAME        READY   STATUS    RESTARTS   AGE
busybox     1/1     Running   0          7m37s
busybox-3   1/1     Running   0          4m19s
busybox-4   1/1     Running   0          3s
[corona@bastion f5]$ oc rsh busybox-4
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if165: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue 
    link/ether d6:18:21:0a:08:10 brd ff:ff:ff:ff:ff:ff
    inet 172.10.8.15/23 brd 172.10.9.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6cef:a9ff:fe00:7b5a/64 scope link 
       valid_lft forever preferred_lft forever
27: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq qlen 1000
    link/ether fe:07:81:de:e4:16 brd ff:ff:ff:ff:ff:ff
    inet 10.75.71.83/27 brd 10.75.71.95 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::fc07:81ff:fede:e416/64 scope link 
       valid_lft forever preferred_lft forever
/ # exit
[corona@bastion f5]$ 
[corona@bastion f5]$ ping 10.75.71.83
PING 10.75.71.83 (10.75.71.83) 56(84) bytes of data.
64 bytes from 10.75.71.83: icmp_seq=1 ttl=64 time=2.100 ms
64 bytes from 10.75.71.83: icmp_seq=2 ttl=64 time=0.114 ms
^C
--- 10.75.71.83 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 0.114/1.556/2.999/1.443 ms
[corona@bastion f5]$ oc delete busybox-4
error: resource(s) were provided, but no name, label selector, or --all flag specified
[corona@bastion f5]$ oc delete pod busybox-4
pod "busybox-4" deleted
[corona@bastion f5]$ oc create -f pod-with-vf-4.yaml 
pod/busybox-4 created
[corona@bastion f5]$ oc get pods
NAME        READY   STATUS    RESTARTS   AGE
busybox     1/1     Running   0          9m23s
busybox-3   1/1     Running   0          6m5s
busybox-4   1/1     Running   0          6s
[corona@bastion f5]$ oc rsh busybox-4
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if166: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue 
    link/ether d6:18:21:0a:08:10 brd ff:ff:ff:ff:ff:ff
    inet 172.10.8.15/23 brd 172.10.9.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::1096:c3ff:fec8:fe34/64 scope link 
       valid_lft forever preferred_lft forever
27: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq qlen 1000
    link/ether fe:07:81:de:e4:16 brd ff:ff:ff:ff:ff:ff
    inet 10.75.71.83/27 brd 10.75.71.95 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::fc07:81ff:fede:e416/64 scope link 
       valid_lft forever preferred_lft forever
/ # exit
[corona@bastion f5]$ ping 10.75.71.83
PING 10.75.71.83 (10.75.71.83) 56(84) bytes of data.
^C
--- 10.75.71.83 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 119ms


[corona@bastion f5]$ cat nnp-lb-f501-int-ens3f0vf4.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name:  nnp-lb-f501-ext-inf-ens3f0vf4
  namespace: openshift-sriov-network-operator
spec:
  resourceName: ens3f0nicvf4
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  numVfs: 127
  nicSelector:
    vendor: "15b3"
    deviceID: "1017"
    pfNames: ["ens3f0#4-4"]
  deviceType: netdevice
[corona@bastion f5]$ cat nad-lb-4.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: nad-lb-ext-inf
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: f5-lb
  ipam: '{"type": "static","addresses":[{"address":"10.75.71.83/27","gateway":"10.75.71.65"}]}'
  resourceName: ens3f0nicvf4

Expected results:


Additional info:

When looking at the worker node where the pod is running, the SRIOV nic vf does not show a MAC address.  If you remove the security setting (ip link set ens3f0 vf 0 spoof off trust on) the pod start of ping.

Comment 1 zenghui.shi 2020-04-22 03:58:27 UTC

*** This bug has been marked as a duplicate of bug 1826595 ***