Bug 1893744 - [CNAO] Cannot apply pod placement on a "day-2" phase. CR got reconciled
Summary: [CNAO] Cannot apply pod placement on a "day-2" phase. CR got reconciled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 2.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.5.0
Assignee: Ram Lavi
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-02 13:58 UTC by Igor Bezukh
Modified: 2021-03-10 23:25 UTC (History)
6 users (show)

Fixed In Version: cluster-network-addons-operator-container-v2.5.0-17, hco-bundle-registry:v2.5.0-429
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 13:24:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
A configuration example for the HCO CR (337 bytes, text/plain)
2020-11-02 13:58 UTC, Igor Bezukh
no flags Details
Logs from the HCO operator pod (1.12 MB, application/gzip)
2020-11-02 15:24 UTC, Igor Bezukh
no flags Details
Logs from the CNAO pod (606.48 KB, application/gzip)
2020-11-02 15:26 UTC, Igor Bezukh
no flags Details
OC describe nodes (105.41 KB, text/plain)
2020-11-02 15:26 UTC, Igor Bezukh
no flags Details
oc get nodes YAML (173.29 KB, text/plain)
2020-11-02 15:27 UTC, Igor Bezukh
no flags Details
CNAO CR (4.96 KB, text/plain)
2020-11-02 15:32 UTC, Igor Bezukh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt cluster-network-addons-operator pull 639 0 None closed [release-0.42] placement-configuration, Allow day2 changes 2020-11-27 16:27:06 UTC
Red Hat Product Errata RHEA-2020:5127 0 None None None 2020-11-17 13:25:03 UTC

Description Igor Bezukh 2020-11-02 13:58:21 UTC
Created attachment 1725799 [details]
A configuration example for the HCO CR

Description of problem:
Pod placement changes cannot be applied to the cluster network config CR
because of the reconciliation process.

Version-Release number of selected component (if applicable):
CNV 2.5

How reproducible:
100%

Steps to Reproduce:
1. Apply the attached configuration example to the HCO CR
2. Observe the HCO and CNAO operator logs.

Actual results:
HCO operator is down because the ClusterNetworkConfig status is not ready.

Expected results:
Pod placement successfully applied to all CNV CRs.

Additional info:

Comment 1 Nahshon Unna-Tsameret 2020-11-02 14:12:01 UTC
Please add logs. Please write here all nodes labels and taint. Also, please copy the CNA CR's spec.

Comment 2 Igor Bezukh 2020-11-02 15:24:48 UTC
Created attachment 1725863 [details]
Logs from the HCO operator pod

Comment 3 Igor Bezukh 2020-11-02 15:26:01 UTC
Created attachment 1725869 [details]
Logs from the CNAO pod

Comment 4 Igor Bezukh 2020-11-02 15:26:39 UTC
Created attachment 1725870 [details]
OC describe nodes

Comment 5 Igor Bezukh 2020-11-02 15:27:12 UTC
Created attachment 1725871 [details]
oc get nodes YAML

Comment 6 Igor Bezukh 2020-11-02 15:32:04 UTC
Created attachment 1725872 [details]
CNAO CR

Comment 7 Nahshon Unna-Tsameret 2020-11-02 15:57:56 UTC
The master taint is "node-role.kubernetes.io/master:NoSchedule" while the tolleration key is just "master":

        tolerations:
        - effect: NoSchedule
          key: master
          operator: Exists

Comment 8 yzaindbe 2020-11-10 15:51:58 UTC
Igor, please elaborate on "Pod placement successfully applied to all CNV CRs".
please be more specific, for me to know what to look for in my cluster after patching HCO CR
thx

Comment 9 Igor Bezukh 2020-11-11 12:32:29 UTC
Yes of course. So the initial state should be that we have the CNV 2.5 deployed, w/o the pod placement details in the HCO CR
then, at this point of time called day-2, we want to use the pod placement via the HCO API so we will patch the HCO CR with the relevant placement details e.g. put all infra on nodes with label "infra-nodes" and place worklaods on nodes with label "worload-node"
This configuration should be applied by the HCO operator to all the CRs of all the CNV components, also to CNAO CR
you should validate that CNAO CR contains the pod placement configuration after you've applied it on HCO CR
and you should observe replacement of all pods that are managed by CNAO

Comment 10 yzaindbe 2020-11-11 18:19:18 UTC
ENVIRONMENT:
------------
OCP 4.6.3
CNV 2.5.0
cluster-network-addons-operator-container-v2.5.0-17
hco-bundle-registry-container-v2.5.0-437


BEFORE HCO CR change:
------------------

HCO CR:
spec:
  bareMetalPlatform: true
  infra: {}
  version: v2.5.0
  workloads: {}

CNAO CR:
Spec:
  Kube Mac Pool:
  Linux Bridge:
  Multus:
  Nmstate:
  Ovs:


CNAO table of infra/workload components:
Infra:
kubemacpool-mac-controller-manager
nmstate-webhook

Workload:
bridge-marker
kube-cni-linux-bridge-plugin
nmstate-handler
ovs-cni-amd64


Those components pods:
bridge-marker-cmf6d                                  1/1     Running   0          27h   192.168.2.74    yz25-b9pkw-worker-0-f58gn   <none>           <none>
bridge-marker-dmqf9                                  1/1     Running   0          27h   192.168.2.244   yz25-b9pkw-master-0         <none>           <none>
bridge-marker-n2vj8                                  1/1     Running   0          27h   192.168.0.210   yz25-b9pkw-master-2         <none>           <none>
bridge-marker-qn8pq                                  1/1     Running   0          27h   192.168.1.173   yz25-b9pkw-master-1         <none>           <none>
bridge-marker-v9xww                                  1/1     Running   0          27h   192.168.0.141   yz25-b9pkw-worker-0-ttbmq   <none>           <none>
bridge-marker-vhsnn                                  1/1     Running   0          27h   192.168.0.207   yz25-b9pkw-worker-0-rkjnz   <none>           <none>
kube-cni-linux-bridge-plugin-95rmq                   1/1     Running   0          27h   10.131.0.2      yz25-b9pkw-worker-0-f58gn   <none>           <none>
kube-cni-linux-bridge-plugin-dcwg4                   1/1     Running   0          27h   10.128.0.38     yz25-b9pkw-master-2         <none>           <none>
kube-cni-linux-bridge-plugin-gn577                   1/1     Running   0          27h   10.128.2.5      yz25-b9pkw-worker-0-rkjnz   <none>           <none>
kube-cni-linux-bridge-plugin-qfh8c                   1/1     Running   0          27h   10.129.2.3      yz25-b9pkw-worker-0-ttbmq   <none>           <none>
kube-cni-linux-bridge-plugin-r82z2                   1/1     Running   0          27h   10.130.0.22     yz25-b9pkw-master-0         <none>           <none>
kube-cni-linux-bridge-plugin-x2j97                   1/1     Running   0          27h   10.129.0.11     yz25-b9pkw-master-1         <none>           <none>
kubemacpool-mac-controller-manager-b7dc7ccbc-dlt4w   1/1     Running   0          27h   10.129.0.12     yz25-b9pkw-master-1         <none>           <none>
nmstate-handler-24554                                1/1     Running   0          27h   192.168.1.173   yz25-b9pkw-master-1         <none>           <none>
nmstate-handler-5vfvc                                1/1     Running   1          27h   192.168.0.141   yz25-b9pkw-worker-0-ttbmq   <none>           <none>
nmstate-handler-k87nj                                1/1     Running   1          27h   192.168.0.207   yz25-b9pkw-worker-0-rkjnz   <none>           <none>
nmstate-handler-l5lhw                                1/1     Running   0          27h   192.168.2.244   yz25-b9pkw-master-0         <none>           <none>
nmstate-handler-lxkh5                                1/1     Running   1          27h   192.168.2.74    yz25-b9pkw-worker-0-f58gn   <none>           <none>
nmstate-handler-pkmpm                                1/1     Running   0          27h   192.168.0.210   yz25-b9pkw-master-2         <none>           <none>
nmstate-webhook-58d96c9964-2cq46                     1/1     Running   0          27h   10.130.0.23     yz25-b9pkw-master-0         <none>           <none>
nmstate-webhook-58d96c9964-fcd9r                     1/1     Running   0          27h   10.129.0.13     yz25-b9pkw-master-1         <none>           <none>
ovs-cni-amd64-5z9bb                                  1/1     Running   1          27h   192.168.0.207   yz25-b9pkw-worker-0-rkjnz   <none>           <none>
ovs-cni-amd64-8p5zq                                  1/1     Running   1          27h   192.168.0.141   yz25-b9pkw-worker-0-ttbmq   <none>           <none>
ovs-cni-amd64-bhcjf                                  1/1     Running   0          27h   192.168.0.210   yz25-b9pkw-master-2         <none>           <none>
ovs-cni-amd64-n2mzc                                  1/1     Running   1          27h   192.168.2.74    yz25-b9pkw-worker-0-f58gn   <none>           <none>
ovs-cni-amd64-sjvr7                                  1/1     Running   0          27h   192.168.2.244   yz25-b9pkw-master-0         <none>           <none>
ovs-cni-amd64-tfr24                                  1/1     Running   0          27h   192.168.1.173   yz25-b9pkw-master-1         <none>           <none>


AFTER HCO CR change:
--------------------
we expec:
1. the patch to be updated in CNAO also
2. CNAO workload pods to be only on worker nodes
3. CNAO infra pods to be only on master nodes

HCO CR:
spec:
  bareMetalPlatform: true
  infra:
    nodePlacement:
      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
  version: v2.5.0
  workloads:
    nodePlacement:
      nodeSelector:
        node-role.kubernetes.io/worker: ""

CNAO CR:
Spec:
  Kube Mac Pool:
  Linux Bridge:
  Multus:
  Nmstate:
  Ovs:
  Placement Configuration:
    Infra:
      Affinity:
      Node Selector:
        node-role.kubernetes.io/master:  
      Tolerations:
        Effect:    NoSchedule
        Key:       node-role.kubernetes.io/master
        Operator:  Exists
    Workloads:
      Affinity:
      Node Selector:
        node-role.kubernetes.io/worker:

CNAO pods:
bridge-marker-2rjrd                                   1/1     Running   0          69m   192.168.0.207   yz25-b9pkw-worker-0-rkjnz   <none>           <none>
bridge-marker-4g8sp                                   1/1     Running   0          69m   192.168.0.141   yz25-b9pkw-worker-0-ttbmq   <none>           <none>
bridge-marker-vbv86                                   1/1     Running   0          69m   192.168.2.74    yz25-b9pkw-worker-0-f58gn   <none>           <none>
kube-cni-linux-bridge-plugin-9m5kf                    1/1     Running   0          68m   10.131.0.61     yz25-b9pkw-worker-0-f58gn   <none>           <none>
kube-cni-linux-bridge-plugin-dzqdm                    1/1     Running   0          69m   10.128.2.64     yz25-b9pkw-worker-0-rkjnz   <none>           <none>
kube-cni-linux-bridge-plugin-g9wz6                    1/1     Running   0          68m   10.129.2.98     yz25-b9pkw-worker-0-ttbmq   <none>           <none>
kubemacpool-mac-controller-manager-64c7577d54-6xhxt   1/1     Running   0          66m   10.129.0.36     yz25-b9pkw-master-1         <none>           <none>
nmstate-handler-2hlkh                                 1/1     Running   0          69m   192.168.0.207   yz25-b9pkw-worker-0-rkjnz   <none>           <none>
nmstate-handler-2q6nw                                 1/1     Running   0          70m   192.168.0.141   yz25-b9pkw-worker-0-ttbmq   <none>           <none>
nmstate-handler-fvbgn                                 1/1     Running   0          69m   192.168.2.74    yz25-b9pkw-worker-0-f58gn   <none>           <none>
nmstate-webhook-7c8cff4d5-gn4zb                       1/1     Running   0          66m   10.130.0.37     yz25-b9pkw-master-0         <none>           <none>
nmstate-webhook-7c8cff4d5-tvm99                       1/1     Running   0          66m   10.129.0.37     yz25-b9pkw-master-1         <none>           <none>
ovs-cni-amd64-8htxf                                   1/1     Running   0          69m   192.168.0.141   yz25-b9pkw-worker-0-ttbmq   <none>           <none>
ovs-cni-amd64-wt228                                   1/1     Running   0          69m   192.168.0.207   yz25-b9pkw-worker-0-rkjnz   <none>           <none>
ovs-cni-amd64-xm5l7                                   1/1     Running   0          69m   192.168.2.74    yz25-b9pkw-worker-0-f58gn   <none>           <none>


BEFORE HCO subscription change:
------------------------------

subscription:
Spec:
  Channel:           stable
  Name:              kubevirt-hyperconverged
  Source:            hco-catalogsource
  Source Namespace:  openshift-marketplace
  Starting CSV:      kubevirt-hyperconverged-operator.v2.5.0

cnv operators:
cdi-operator-c88b8cfc6-9tmxb                          1/1     Running   0          30h   10.128.2.7      yz25-b9pkw-worker-0-rkjnz   <none>           <none>
cluster-network-addons-operator-6b589bdf9c-95pfz      1/1     Running   0          30h   10.128.2.14     yz25-b9pkw-worker-0-rkjnz   <none>           <none>
hco-operator-696d6686b7-jcm9w                         1/1     Running   0          30h   10.128.2.11     yz25-b9pkw-worker-0-rkjnz   <none>           <none>
hostpath-provisioner-operator-599d878d4-qmnm9         1/1     Running   0          30h   10.128.2.13     yz25-b9pkw-worker-0-rkjnz   <none>           <none>
kubevirt-ssp-operator-64b755cfbb-8wszz                1/1     Running   0          30h   10.128.2.21     yz25-b9pkw-worker-0-rkjnz   <none>           <none>
node-maintenance-operator-787494dd5c-vbxcv            1/1     Running   0          30h   10.129.0.10     yz25-b9pkw-master-1         <none>           <none>
virt-operator-68f47845f4-khtvj                        1/1     Running   0          30h   10.128.2.15     yz25-b9pkw-worker-0-rkjnz   <none>           <none>
virt-operator-68f47845f4-nn6pz                        1/1     Running   0          30h   10.131.0.22     yz25-b9pkw-worker-0-f58gn   <none>           <none>
vm-import-operator-7dbfcd675d-bztvq                   1/1     Running   0          30h   10.128.2.23     yz25-b9pkw-worker-0-rkjnz   <none>           <none>


AFTER HCO subscription change:
------------------------------
we expect all cnv operators to be on master nodes.

subscription:
Spec:
  Channel:  stable
  Config:
    Node Selector:
      node-role.kubernetes.io/master:  
    Tolerations:
      Effect:        NoSchedule
      Key:           node-role.kubernetes.io/master
      Operator:      Exists
  Name:              kubevirt-hyperconverged
  Source:            hco-catalogsource
  Source Namespace:  openshift-marketplace
  Starting CSV:      kubevirt-hyperconverged-operator.v2.5.0

cnv operators:
cdi-operator-76586b78fd-4hbhz                         1/1     Running   0          70s   10.128.0.53     yz25-b9pkw-master-2         <none>           <none>
cluster-network-addons-operator-b9b59cf5-j2l2r        1/1     Running   0          65s   10.130.0.41     yz25-b9pkw-master-0         <none>           <none>
hco-operator-67856bbbc7-whtsr                         1/1     Running   0          73s   10.129.0.42     yz25-b9pkw-master-1         <none>           <none>
hostpath-provisioner-operator-594d457f9d-7nx5w        1/1     Running   0          69s   10.129.0.46     yz25-b9pkw-master-1         <none>           <none>
kubevirt-ssp-operator-5bcb787f9b-8vdzt                1/1     Running   0          70s   10.129.0.45     yz25-b9pkw-master-1         <none>           <none>
node-maintenance-operator-6fb46bb854-9rxrr            1/1     Running   0          69s   10.130.0.40     yz25-b9pkw-master-0         <none>           <none>
virt-operator-65854b5479-8srtf                        1/1     Running   0          47s   10.128.0.55     yz25-b9pkw-master-2         <none>           <none>
virt-operator-65854b5479-r6rqn                        1/1     Running   0          71s   10.129.0.44     yz25-b9pkw-master-1         <none>           <none>
vm-import-operator-dd847c66d-2hf2h                    1/1     Running   0          68s   10.128.0.54     yz25-b9pkw-master-2         <none>           <none>

Comment 13 errata-xmlrpc 2020-11-17 13:24:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5127


Note You need to log in before you can comment on or make changes to this bug.