Bug 1770668

Summary: [sriov] The exist net-attach-def disappeared when created multi sriovnetwork with same netnamespace
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bbennett, zshi
Version: 4.3.0Keywords: Reopened
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1808284 (view as bug list) Environment:
Last Closed: 2020-05-04 11:15:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1808284    
Attachments:
Description Flags
sriov-network-operator logs none

Description zhaozhanqi 2019-11-11 05:48:12 UTC
Description of problem:
Creating multi sriovnetwork with same netnamespaces. the exist net-attach-def will be disappeared. only new created one can be shown.

Version-Release number of selected component (if applicable):
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.3.0-201911080552-ose-sriov-network-operator

How reproducible:
always

Steps to Reproduce:
1. Create srioventwork with following CR:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: example-sriovnetwork
  namespace: openshift-sriov-network-operator
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "10.56.217.1"
    }
  vlan: 0
  spoofChk: on
  trust: off
  resourceName: intelnetdevice
  networkNamespace: z1
2. Check the net-attach-def in namespaces z1
  oc get net-attach-def -n z1
3. Create another CR with same netnamespace z1
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: copy-sriovnetwork
  namespace: openshift-sriov-network-operator
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "10.56.217.1"
    }
  vlan: 0
  spoofChk: "on"
  trust: "off"
  resourceName: intelnetdevice
  networkNamespace: z1

4. Check the net-attach-def in z1 again
5. Check the logs 

Actual results:

step 4:
 oc get net-attach-def -n z1 
NAME                AGE
copy-sriovnetwork   11m
[root@hp-dl388g9-03 sriovnetwork]# 

# oc get sriovnetwork 
NAME                   AGE
copy-sriovnetwork      11m
example-sriovnetwork   4d19h

step 5 show logs:

{"level":"info","ts":1573450654.6433642,"logger":"controller_sriovnetwork","msg":"NetworkAttachmentDefinition already exist, updating","Request.Namespace":"z1","Request.Name":"example-sriovnetwork"}
{"level":"info","ts":1573450654.6512325,"logger":"controller_sriovnetwork","msg":"NetworkAttachmentDefinition","Request.Namespace":"z1","Request.Name":"example-sriovnetwork","list":[{"kind":"NetworkAttachmentDefinition","apiVersion":"k8s.cni.cncf.io/v1","metadata":{"name":"example-sriovnetwork","namespace":"z1","selfLink":"/apis/k8s.cni.cncf.io/v1/namespaces/z1/network-attachment-definitions/example-sriovnetwork","uid":"3cee6587-dfb3-43cb-ad98-70db6d1264e4","resourceVersion":"11875138","generation":1,"creationTimestamp":"2019-11-11T05:37:34Z","annotations":{"k8s.v1.cni.cncf.io/resourceName":"openshift.io/intelnetdevice"},"ownerReferences":[{"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetwork","name":"example-sriovnetwork","uid":"dd594330-cb54-4712-bfcc-52bb767a352a","controller":true,"blockOwnerDeletion":true}]},"spec":{"config":"{ \"cniVersion\":\"0.3.1\", \"name\":\"sriov-net\", \"type\":\"sriov\", \"vlan\":0,\"spoofchk\":\"on\",\"trust\":\"off\",\"vlanQoS\":0,\"ipam\":{\"type\":\"host-local\",\"subnet\":\"10.56.217.0/24\",\"rangeStart\":\"10.56.217.171\",\"rangeEnd\":\"10.56.217.181\",\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"gateway\":\"10.56.217.1\"} }"},"status":{}}]}
{"level":"info","ts":1573450654.6531472,"logger":"controller_sriovnetwork","msg":"Reconciling SriovNetwork","Request.Namespace":"z1","Request.Name":"example-sriovnetwork"}
{"level":"info","ts":1573450654.6531982,"logger":"controller_sriovnetwork.renderNetAttDef","msg":"Start to render SRIOV CNI NetworkAttachementDefinition"}
manifest {"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"annotations":{"k8s.v1.cni.cncf.io/resourceName":"openshift.io/intelnetdevice"},"name":"example-sriovnetwork","namespace":"z1"},"spec":{"config":"{ \"cniVersion\":\"0.3.1\", \"name\":\"sriov-net\", \"type\":\"sriov\", \"vlan\":0,\"spoofchk\":\"on\",\"trust\":\"off\",\"vlanQoS\":0,\"ipam\":{\"type\":\"host-local\",\"subnet\":\"10.56.217.0/24\",\"rangeStart\":\"10.56.217.171\",\"rangeEnd\":\"10.56.217.181\",\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"gateway\":\"10.56.217.1\"} }"}}
{"level":"info","ts":1573450654.654505,"logger":"controller_sriovnetwork","msg":"NetworkAttachmentDefinition not exist, creating","Request.Namespace":"z1","Request.Name":"example-sriovnetwork"}
{"level":"info","ts":1573450654.662164,"logger":"controller_sriovnetwork","msg":"NetworkAttachmentDefinition","Request.Namespace":"z1","Request.Name":"example-sriovnetwork","list":[{"kind":"NetworkAttachmentDefinition","apiVersion":"k8s.cni.cncf.io/v1","metadata":{"name":"example-sriovnetwork","namespace":"z1","selfLink":"/apis/k8s.cni.cncf.io/v1/namespaces/z1/network-attachment-definitions/example-sriovnetwork","uid":"197ba3e9-a855-4168-a80c-343697631df3","resourceVersion":"11875140","generation":1,"creationTimestamp":"2019-11-11T05:37:34Z","annotations":{"k8s.v1.cni.cncf.io/resourceName":"openshift.io/intelnetdevice"},"ownerReferences":[{"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetwork","name":"example-sriovnetwork","uid":"dd594330-cb54-4712-bfcc-52bb767a352a","controller":true,"blockOwnerDeletion":true}]},"spec":{"config":"{ \"cniVersion\":\"0.3.1\", \"name\":\"sriov-net\", \"type\":\"sriov\", \"vlan\":0,\"spoofchk\":\"on\",\"trust\":\"off\",\"vlanQoS\":0,\"ipam\":{\"type\":\"host-local\",\"subnet\":\"10.56.217.0/24\",\"rangeStart\":\"10.56.217.171\",\"rangeEnd\":\"10.56.217.181\",\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"gateway\":\"10.56.217.1\"} }"},"status":{}}]}
{"level":"info","ts":1573450654.6622777,"logger":"controller_sriovnetwork","msg":"Reconciling SriovNetwork","Request.Namespace":"z1","Request.Name":"example-sriovnetwork"}
{"level":"info","ts":1573450654.662304,"logger":"controller_sriovnetwork.renderNetAttDef","msg":"Start to render SRIOV CNI NetworkAttachementDefinition"}
manifest {"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"annotations":{"k8s.v1.cni.cncf.io/resourceName":"openshift.io/intelnetdevice"},"name":"example-sriovnetwork","namespace":"z1"},"spec":{"config":"{ \"cniVersion\":\"0.3.1\", \"name\":\"sriov-net\", \"type\":\"sriov\", \"vlan\":0,\"spoofchk\":\"on\",\"trust\":\"off\",\"vlanQoS\":0,\"ipam\":{\"type\":\"host-local\",\"subnet\":\"10.56.217.0/24\",\"rangeStart\":\"10.56.217.171\",\"rangeEnd\":\"10.56.217.181\",\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"gateway\":\"10.56.217.1\"} }"}}
{"level":"info","ts":1573450654.6633935,"logger":"controller_sriovnetwork","msg":"NetworkAttachmentDefinition already exist, updating","Request.Namespace":"z1","Request.Name":"example-sriovnetwork"}

Expected results:


Additional info:

Comment 1 zhaozhanqi 2019-11-12 02:16:25 UTC
Try it again, Found the this issue did not be reproduced, Close this bug for now.

Comment 2 zhaozhanqi 2019-11-15 05:34:00 UTC
reopen this bug since it is reproduced again. 
Found the pod in `openshift-kube-apiserver` ns is recreated. Do not sure if it caused this issue.

Comment 3 Peng Liu 2019-11-26 07:52:56 UTC
I haven't figure out how to reproduce it in my environment.

Comment 4 zenghui.shi 2019-12-06 10:18:07 UTC
I was not able to reproduce this issue with latest sr-iov images using steps described in problem description.
From the log message, I didn't see 'copy-sriovnetwork' gets created which might indicate Operator didn't receive any request to create a new sriov network named 'copy-sriovnetwork'.
Please help to attach full log message next time when the error occurs.

Comment 5 Peng Liu 2019-12-10 15:45:24 UTC
I don't think this should be a release blocker, since it is a kind of corner case which is hard to reproduce.

Comment 6 zenghui.shi 2019-12-11 08:22:25 UTC
move to 4.4 based on comment #5

Comment 8 zhaozhanqi 2019-12-23 07:33:43 UTC
Created attachment 1647277 [details]
sriov-network-operator logs

Comment 9 Peng Liu 2019-12-27 07:10:12 UTC
This bug is caused by cross namespace owner reference, which is not allowed by Kubernetes. And after this PR https://github.com/kubernetes-sigs/controller-runtime/pull/675, it will be blocked from code as well. We need to implement the garbage collection for sriovnetwork CR in another way.

Comment 11 zhaozhanqi 2020-02-04 13:16:58 UTC
verified this bug with below image according step comment 7
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.4.0-202002031216-ose-sriov-network-operator
quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.4.0-202002031216-ose-sriov-network-config-daemon

Comment 13 errata-xmlrpc 2020-05-04 11:15:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581