Hide Forgot
Created attachment 1551276 [details] network_operator_log Description of problem: Trying to setup cluster with SRIOV enabled. The installation will report fail eventually with the network operator is still updating. When checking the network operator after the installation finished (with fail), there is no version populated. Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-04-02-133735 How reproducible: always Steps to Reproduce: 1. Generate the manifests and update the cluster-network-03-config.yaml with following # openshift-install create manifests # cat manifests/cluster-network-03-config.yaml apiVersion: "operator.openshift.io/v1" kind: "Network" metadata: name: "cluster" spec: serviceNetwork: - "172.30.0.0/16" clusterNetwork: - cidr: "10.128.0.0/14" hostPrefix: 23 defaultNetwork: type: OpenShiftSDN openshiftSDNConfig: mode: NetworkPolicy additionalNetworks: - type: Raw name: sriov-conf rawCNIConfig: '{ "type": "sriov", "name": "sriov-network", "ipam": { "type": "host-local", "subnet": "10.11.11.0/24", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.11.11.1" } }' 2. Install the cluster # openshift-install create cluster 3. Check the network operator # oc get clusteroperator network 4. Check the cluster version # oc get clusterversion 5. Check the operator pod log # oc logs -f network-operator-d6c8c48b7-w8cm7 -n openshift-network-operator Actual results: Step2: The cluster installation will get failed eventually. INFO Waiting up to 30m0s for the cluster at https://API_SERVER:6443 to initialize... DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 64% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 88% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 90% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 91% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 95% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 97% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 98% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 98% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.0.0-0.nightly-2019-04-02-133735: 99% complete DEBUG Still waiting for the cluster to initialize: Cluster operator network is still updating FATAL failed to initialize the cluster: Cluster operator network is still updating: timed out waiting for the condition Step3: There is no operator version populated. # oc get clusteroperator network NAME VERSION AVAILABLE PROGRESSING FAILING SINCE network True False False 30m Step4: The clusterversion shows that the cluster is not AVAILABLE. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-04-02-133735 False True 25m Unable to apply 4.0.0-0.nightly-2019-04-02-133735: the cluster operator network has not yet successfully rolled out Step5: Full operator log attached. Expected results: Should be able to setup cluster successfully with the sriov enabled. Additional info:
Some more info about the cluster. The sriov service are running well under the correct project # oc get po,ds,sa -n openshift-sriov NAME READY STATUS RESTARTS AGE pod/sriov-cni-8g555 1/1 Running 0 20m pod/sriov-cni-92fkn 1/1 Running 0 20m pod/sriov-cni-bnxsz 1/1 Running 0 20m pod/sriov-cni-k4tdr 1/1 Running 0 26m pod/sriov-cni-n4hqp 1/1 Running 0 26m pod/sriov-cni-vfc8k 1/1 Running 0 26m pod/sriov-device-plugin-5r55w 1/1 Running 0 26m pod/sriov-device-plugin-94mjs 1/1 Running 0 26m pod/sriov-device-plugin-k2rrx 1/1 Running 0 20m pod/sriov-device-plugin-m7mwp 1/1 Running 0 19m pod/sriov-device-plugin-mn62z 1/1 Running 0 25m pod/sriov-device-plugin-rbv2q 1/1 Running 0 19m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.extensions/sriov-cni 6 6 6 6 6 beta.kubernetes.io/os=linux 26m daemonset.extensions/sriov-device-plugin 6 6 6 6 6 beta.kubernetes.io/os=linux 26m NAME SECRETS AGE serviceaccount/builder 2 20m serviceaccount/default 2 24m serviceaccount/deployer 2 20m serviceaccount/sriov-cni 2 26m serviceaccount/sriov-device-plugin 2 26m
fix merged in CNO: https://github.com/openshift/cluster-network-operator/pull/138
Tested with build 4.0.0-0.nightly-2019-04-10-182914 Issue has been fixed. The cluster setup can finish successfully, the network operator gets the correct version and status. The version field is added for both sriov-cni and sriov-device-plugin ds.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758