Created attachment 1725799 [details] A configuration example for the HCO CR Description of problem: Pod placement changes cannot be applied to the cluster network config CR because of the reconciliation process. Version-Release number of selected component (if applicable): CNV 2.5 How reproducible: 100% Steps to Reproduce: 1. Apply the attached configuration example to the HCO CR 2. Observe the HCO and CNAO operator logs. Actual results: HCO operator is down because the ClusterNetworkConfig status is not ready. Expected results: Pod placement successfully applied to all CNV CRs. Additional info:
Please add logs. Please write here all nodes labels and taint. Also, please copy the CNA CR's spec.
Created attachment 1725863 [details] Logs from the HCO operator pod
Created attachment 1725869 [details] Logs from the CNAO pod
Created attachment 1725870 [details] OC describe nodes
Created attachment 1725871 [details] oc get nodes YAML
Created attachment 1725872 [details] CNAO CR
The master taint is "node-role.kubernetes.io/master:NoSchedule" while the tolleration key is just "master": tolerations: - effect: NoSchedule key: master operator: Exists
Igor, please elaborate on "Pod placement successfully applied to all CNV CRs". please be more specific, for me to know what to look for in my cluster after patching HCO CR thx
Yes of course. So the initial state should be that we have the CNV 2.5 deployed, w/o the pod placement details in the HCO CR then, at this point of time called day-2, we want to use the pod placement via the HCO API so we will patch the HCO CR with the relevant placement details e.g. put all infra on nodes with label "infra-nodes" and place worklaods on nodes with label "worload-node" This configuration should be applied by the HCO operator to all the CRs of all the CNV components, also to CNAO CR you should validate that CNAO CR contains the pod placement configuration after you've applied it on HCO CR and you should observe replacement of all pods that are managed by CNAO
ENVIRONMENT: ------------ OCP 4.6.3 CNV 2.5.0 cluster-network-addons-operator-container-v2.5.0-17 hco-bundle-registry-container-v2.5.0-437 BEFORE HCO CR change: ------------------ HCO CR: spec: bareMetalPlatform: true infra: {} version: v2.5.0 workloads: {} CNAO CR: Spec: Kube Mac Pool: Linux Bridge: Multus: Nmstate: Ovs: CNAO table of infra/workload components: Infra: kubemacpool-mac-controller-manager nmstate-webhook Workload: bridge-marker kube-cni-linux-bridge-plugin nmstate-handler ovs-cni-amd64 Those components pods: bridge-marker-cmf6d 1/1 Running 0 27h 192.168.2.74 yz25-b9pkw-worker-0-f58gn <none> <none> bridge-marker-dmqf9 1/1 Running 0 27h 192.168.2.244 yz25-b9pkw-master-0 <none> <none> bridge-marker-n2vj8 1/1 Running 0 27h 192.168.0.210 yz25-b9pkw-master-2 <none> <none> bridge-marker-qn8pq 1/1 Running 0 27h 192.168.1.173 yz25-b9pkw-master-1 <none> <none> bridge-marker-v9xww 1/1 Running 0 27h 192.168.0.141 yz25-b9pkw-worker-0-ttbmq <none> <none> bridge-marker-vhsnn 1/1 Running 0 27h 192.168.0.207 yz25-b9pkw-worker-0-rkjnz <none> <none> kube-cni-linux-bridge-plugin-95rmq 1/1 Running 0 27h 10.131.0.2 yz25-b9pkw-worker-0-f58gn <none> <none> kube-cni-linux-bridge-plugin-dcwg4 1/1 Running 0 27h 10.128.0.38 yz25-b9pkw-master-2 <none> <none> kube-cni-linux-bridge-plugin-gn577 1/1 Running 0 27h 10.128.2.5 yz25-b9pkw-worker-0-rkjnz <none> <none> kube-cni-linux-bridge-plugin-qfh8c 1/1 Running 0 27h 10.129.2.3 yz25-b9pkw-worker-0-ttbmq <none> <none> kube-cni-linux-bridge-plugin-r82z2 1/1 Running 0 27h 10.130.0.22 yz25-b9pkw-master-0 <none> <none> kube-cni-linux-bridge-plugin-x2j97 1/1 Running 0 27h 10.129.0.11 yz25-b9pkw-master-1 <none> <none> kubemacpool-mac-controller-manager-b7dc7ccbc-dlt4w 1/1 Running 0 27h 10.129.0.12 yz25-b9pkw-master-1 <none> <none> nmstate-handler-24554 1/1 Running 0 27h 192.168.1.173 yz25-b9pkw-master-1 <none> <none> nmstate-handler-5vfvc 1/1 Running 1 27h 192.168.0.141 yz25-b9pkw-worker-0-ttbmq <none> <none> nmstate-handler-k87nj 1/1 Running 1 27h 192.168.0.207 yz25-b9pkw-worker-0-rkjnz <none> <none> nmstate-handler-l5lhw 1/1 Running 0 27h 192.168.2.244 yz25-b9pkw-master-0 <none> <none> nmstate-handler-lxkh5 1/1 Running 1 27h 192.168.2.74 yz25-b9pkw-worker-0-f58gn <none> <none> nmstate-handler-pkmpm 1/1 Running 0 27h 192.168.0.210 yz25-b9pkw-master-2 <none> <none> nmstate-webhook-58d96c9964-2cq46 1/1 Running 0 27h 10.130.0.23 yz25-b9pkw-master-0 <none> <none> nmstate-webhook-58d96c9964-fcd9r 1/1 Running 0 27h 10.129.0.13 yz25-b9pkw-master-1 <none> <none> ovs-cni-amd64-5z9bb 1/1 Running 1 27h 192.168.0.207 yz25-b9pkw-worker-0-rkjnz <none> <none> ovs-cni-amd64-8p5zq 1/1 Running 1 27h 192.168.0.141 yz25-b9pkw-worker-0-ttbmq <none> <none> ovs-cni-amd64-bhcjf 1/1 Running 0 27h 192.168.0.210 yz25-b9pkw-master-2 <none> <none> ovs-cni-amd64-n2mzc 1/1 Running 1 27h 192.168.2.74 yz25-b9pkw-worker-0-f58gn <none> <none> ovs-cni-amd64-sjvr7 1/1 Running 0 27h 192.168.2.244 yz25-b9pkw-master-0 <none> <none> ovs-cni-amd64-tfr24 1/1 Running 0 27h 192.168.1.173 yz25-b9pkw-master-1 <none> <none> AFTER HCO CR change: -------------------- we expec: 1. the patch to be updated in CNAO also 2. CNAO workload pods to be only on worker nodes 3. CNAO infra pods to be only on master nodes HCO CR: spec: bareMetalPlatform: true infra: nodePlacement: nodeSelector: node-role.kubernetes.io/master: "" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists version: v2.5.0 workloads: nodePlacement: nodeSelector: node-role.kubernetes.io/worker: "" CNAO CR: Spec: Kube Mac Pool: Linux Bridge: Multus: Nmstate: Ovs: Placement Configuration: Infra: Affinity: Node Selector: node-role.kubernetes.io/master: Tolerations: Effect: NoSchedule Key: node-role.kubernetes.io/master Operator: Exists Workloads: Affinity: Node Selector: node-role.kubernetes.io/worker: CNAO pods: bridge-marker-2rjrd 1/1 Running 0 69m 192.168.0.207 yz25-b9pkw-worker-0-rkjnz <none> <none> bridge-marker-4g8sp 1/1 Running 0 69m 192.168.0.141 yz25-b9pkw-worker-0-ttbmq <none> <none> bridge-marker-vbv86 1/1 Running 0 69m 192.168.2.74 yz25-b9pkw-worker-0-f58gn <none> <none> kube-cni-linux-bridge-plugin-9m5kf 1/1 Running 0 68m 10.131.0.61 yz25-b9pkw-worker-0-f58gn <none> <none> kube-cni-linux-bridge-plugin-dzqdm 1/1 Running 0 69m 10.128.2.64 yz25-b9pkw-worker-0-rkjnz <none> <none> kube-cni-linux-bridge-plugin-g9wz6 1/1 Running 0 68m 10.129.2.98 yz25-b9pkw-worker-0-ttbmq <none> <none> kubemacpool-mac-controller-manager-64c7577d54-6xhxt 1/1 Running 0 66m 10.129.0.36 yz25-b9pkw-master-1 <none> <none> nmstate-handler-2hlkh 1/1 Running 0 69m 192.168.0.207 yz25-b9pkw-worker-0-rkjnz <none> <none> nmstate-handler-2q6nw 1/1 Running 0 70m 192.168.0.141 yz25-b9pkw-worker-0-ttbmq <none> <none> nmstate-handler-fvbgn 1/1 Running 0 69m 192.168.2.74 yz25-b9pkw-worker-0-f58gn <none> <none> nmstate-webhook-7c8cff4d5-gn4zb 1/1 Running 0 66m 10.130.0.37 yz25-b9pkw-master-0 <none> <none> nmstate-webhook-7c8cff4d5-tvm99 1/1 Running 0 66m 10.129.0.37 yz25-b9pkw-master-1 <none> <none> ovs-cni-amd64-8htxf 1/1 Running 0 69m 192.168.0.141 yz25-b9pkw-worker-0-ttbmq <none> <none> ovs-cni-amd64-wt228 1/1 Running 0 69m 192.168.0.207 yz25-b9pkw-worker-0-rkjnz <none> <none> ovs-cni-amd64-xm5l7 1/1 Running 0 69m 192.168.2.74 yz25-b9pkw-worker-0-f58gn <none> <none> BEFORE HCO subscription change: ------------------------------ subscription: Spec: Channel: stable Name: kubevirt-hyperconverged Source: hco-catalogsource Source Namespace: openshift-marketplace Starting CSV: kubevirt-hyperconverged-operator.v2.5.0 cnv operators: cdi-operator-c88b8cfc6-9tmxb 1/1 Running 0 30h 10.128.2.7 yz25-b9pkw-worker-0-rkjnz <none> <none> cluster-network-addons-operator-6b589bdf9c-95pfz 1/1 Running 0 30h 10.128.2.14 yz25-b9pkw-worker-0-rkjnz <none> <none> hco-operator-696d6686b7-jcm9w 1/1 Running 0 30h 10.128.2.11 yz25-b9pkw-worker-0-rkjnz <none> <none> hostpath-provisioner-operator-599d878d4-qmnm9 1/1 Running 0 30h 10.128.2.13 yz25-b9pkw-worker-0-rkjnz <none> <none> kubevirt-ssp-operator-64b755cfbb-8wszz 1/1 Running 0 30h 10.128.2.21 yz25-b9pkw-worker-0-rkjnz <none> <none> node-maintenance-operator-787494dd5c-vbxcv 1/1 Running 0 30h 10.129.0.10 yz25-b9pkw-master-1 <none> <none> virt-operator-68f47845f4-khtvj 1/1 Running 0 30h 10.128.2.15 yz25-b9pkw-worker-0-rkjnz <none> <none> virt-operator-68f47845f4-nn6pz 1/1 Running 0 30h 10.131.0.22 yz25-b9pkw-worker-0-f58gn <none> <none> vm-import-operator-7dbfcd675d-bztvq 1/1 Running 0 30h 10.128.2.23 yz25-b9pkw-worker-0-rkjnz <none> <none> AFTER HCO subscription change: ------------------------------ we expect all cnv operators to be on master nodes. subscription: Spec: Channel: stable Config: Node Selector: node-role.kubernetes.io/master: Tolerations: Effect: NoSchedule Key: node-role.kubernetes.io/master Operator: Exists Name: kubevirt-hyperconverged Source: hco-catalogsource Source Namespace: openshift-marketplace Starting CSV: kubevirt-hyperconverged-operator.v2.5.0 cnv operators: cdi-operator-76586b78fd-4hbhz 1/1 Running 0 70s 10.128.0.53 yz25-b9pkw-master-2 <none> <none> cluster-network-addons-operator-b9b59cf5-j2l2r 1/1 Running 0 65s 10.130.0.41 yz25-b9pkw-master-0 <none> <none> hco-operator-67856bbbc7-whtsr 1/1 Running 0 73s 10.129.0.42 yz25-b9pkw-master-1 <none> <none> hostpath-provisioner-operator-594d457f9d-7nx5w 1/1 Running 0 69s 10.129.0.46 yz25-b9pkw-master-1 <none> <none> kubevirt-ssp-operator-5bcb787f9b-8vdzt 1/1 Running 0 70s 10.129.0.45 yz25-b9pkw-master-1 <none> <none> node-maintenance-operator-6fb46bb854-9rxrr 1/1 Running 0 69s 10.130.0.40 yz25-b9pkw-master-0 <none> <none> virt-operator-65854b5479-8srtf 1/1 Running 0 47s 10.128.0.55 yz25-b9pkw-master-2 <none> <none> virt-operator-65854b5479-r6rqn 1/1 Running 0 71s 10.129.0.44 yz25-b9pkw-master-1 <none> <none> vm-import-operator-dd847c66d-2hf2h 1/1 Running 0 68s 10.128.0.54 yz25-b9pkw-master-2 <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5127