Bug 1939060 - CNO: nodes and masters are upgrading simultaneously
Summary: CNO: nodes and masters are upgrading simultaneously
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.8.0
Assignee: Federico Paolinelli
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks: 1940806
TreeView+ depends on / blocked
 
Reported: 2021-03-15 14:16 UTC by Tim Rozet
Modified: 2021-07-27 22:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:53:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cno logs (383.39 KB, text/plain)
2021-03-15 14:18 UTC, Tim Rozet
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1027 0 None open Bug 1939060: OVN Upgrade: fix upgrade order of node and master 2021-03-18 09:20:58 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:53:42 UTC

Description Tim Rozet 2021-03-15 14:16:19 UTC
Description of problem:
While upgrading ovn-kubernetes via CNO, I can see workers and masters both getting upgraded simultaneously. The method I used is:
cat override-cno-control-patch.yaml

- op: add
  path: /spec/overrides
  value: []
- op: add
  path: /spec/overrides/-
  value:
    kind: Deployment
    name: network-operator
    group: operator.openshift.io
    namespace: openshift-network-operator
    unmanaged: true

# sets network operator deployment unmanaged from clusterversion
oc patch --type=json -p "$(cat ~/override-cno-control-patch.yaml)" clusterversion version

cat override-ovn-kubernetes-image-patch.yaml
spec:
  template:
    spec:
      containers:
      - name: network-operator
        env:
        - name: OVN_IMAGE
          # build from images/Dockerfile.bf and https://github.com/ovn-org/ovn-kubernetes/pull/2005
          value: quay.io/zshi/ovn-daemonset:openshift-454-3

# overrides ovn-kubernetes image
oc patch -p "$(cat ~/override-ovn-kubernetes-image-patch.yaml)" deploy network-operator -n openshift-network-operator

Comment 1 Tim Rozet 2021-03-15 14:17:17 UTC
Note I saw this behavior on 4.8, but I assume the behavior also exists in 4.7 so targeting that version.

Comment 2 Tim Rozet 2021-03-15 14:18:27 UTC
Created attachment 1763387 [details]
cno logs

Comment 3 Federico Paolinelli 2021-03-17 17:12:28 UTC
Update: I suspect this is because this is not a real version upgrade, but you changed the image only. The code path is here 
https://github.com/openshift/cluster-network-operator/pull/961/files#diff-3a72fb129233dbf79f270cfcc408ec08f67a3b21e8ecf4fba9ae8d8dd849a83eR445

Comment 4 Federico Paolinelli 2021-03-19 09:51:42 UTC
Update: it was a real bug, fix posted

Comment 6 Mike Fiedler 2021-03-22 19:10:30 UTC
@fpaoline @trozet Does this look right?   One master restarts along with the nodes and then at the end the other 2 masters restart.

Here is the before state and then pod stops/starts after patching the cr.   This is on 4.8.0-0.nightly-2021-03-22-031014

Before

NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-4f86n   6/6     Running   0          141m
ovnkube-master-hbb66   6/6     Running   0          139m
ovnkube-master-hx8jf   6/6     Running   1          125m
ovnkube-node-4jvv9     3/3     Running   0          142m
ovnkube-node-bgjdq     3/3     Running   0          141m
ovnkube-node-pvxwj     3/3     Running   0          126m
ovnkube-node-r5hxf     3/3     Running   0          126m
ovnkube-node-rd7kk     3/3     Running   0          125m
ovnkube-node-z5t24     3/3     Running   0          141m
ovs-node-bjhlf         1/1     Running   0          125m
ovs-node-ft2xz         1/1     Running   0          126m
ovs-node-jzvp8         1/1     Running   0          126m
ovs-node-q4kb7         1/1     Running   0          126m
ovs-node-rn77d         1/1     Running   0          126m
ovs-node-tsg5q         1/1     Running   0          142m



Pod watch after patch the network-operator for a new OVN_IMAGE:

NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-4f86n   6/6     Running   0          143m
ovnkube-master-hbb66   6/6     Running   0          141m
ovnkube-master-hx8jf   6/6     Running   1          127m
ovnkube-node-4jvv9     3/3     Running   0          144m
ovnkube-node-bgjdq     3/3     Running   0          143m
ovnkube-node-pvxwj     3/3     Running   0          128m
ovnkube-node-r5hxf     3/3     Running   0          127m
ovnkube-node-rd7kk     3/3     Running   0          127m
ovnkube-node-z5t24     3/3     Running   0          143m
ovs-node-bjhlf         1/1     Running   0          127m
ovs-node-ft2xz         1/1     Running   0          128m
ovs-node-jzvp8         1/1     Running   0          128m
ovs-node-q4kb7         1/1     Running   0          127m
ovs-node-rn77d         1/1     Running   0          127m
ovs-node-tsg5q         1/1     Running   0          144m
ovs-node-bjhlf         1/1     Terminating   0          129m
ovnkube-master-hbb66   6/6     Terminating   0          143m
ovnkube-node-r5hxf     3/3     Terminating   0          129m
ovs-node-bjhlf         0/1     Terminating   0          129m
ovs-node-bjhlf         0/1     Terminating   0          129m
ovs-node-bjhlf         0/1     Terminating   0          129m
ovs-node-dv8dc         0/1     Pending       0          0s
ovs-node-dv8dc         0/1     Pending       0          0s
ovs-node-dv8dc         0/1     ContainerCreating   0          0s
ovs-node-dv8dc         1/1     Running             0          1s
ovs-node-tsg5q         1/1     Terminating         0          146m
ovs-node-tsg5q         0/1     Terminating         0          146m
ovs-node-tsg5q         0/1     Terminating         0          146m
ovs-node-tsg5q         0/1     Terminating         0          146m
ovs-node-qhwmr         0/1     Pending             0          0s
ovs-node-qhwmr         0/1     Pending             0          0s
ovs-node-qhwmr         0/1     ContainerCreating   0          0s
ovs-node-qhwmr         1/1     Running             0          1s
ovs-node-rn77d         1/1     Terminating         0          129m
ovnkube-node-r5hxf     0/3     Terminating         0          129m
ovs-node-rn77d         0/1     Terminating         0          129m
ovnkube-node-r5hxf     0/3     Terminating         0          130m
ovnkube-node-r5hxf     0/3     Terminating         0          130m
ovnkube-node-wmlm4     0/3     Pending             0          0s
ovnkube-node-wmlm4     0/3     Pending             0          0s
ovnkube-node-wmlm4     0/3     ContainerCreating   0          0s
ovs-node-rn77d         0/1     Terminating         0          130m
ovs-node-rn77d         0/1     Terminating         0          130m
ovs-node-qvmcd         0/1     Pending             0          0s
ovs-node-qvmcd         0/1     Pending             0          0s
ovs-node-qvmcd         0/1     ContainerCreating   0          0s
ovnkube-node-wmlm4     2/3     Running             0          2s
ovs-node-qvmcd         1/1     Running             0          2s
ovs-node-ft2xz         1/1     Terminating         0          130m
ovs-node-ft2xz         0/1     Terminating         0          130m
ovs-node-ft2xz         0/1     Terminating         0          130m
ovs-node-ft2xz         0/1     Terminating         0          130m
ovs-node-hgvxp         0/1     Pending             0          0s
ovs-node-hgvxp         0/1     Pending             0          0s
ovs-node-hgvxp         0/1     ContainerCreating   0          0s
ovs-node-hgvxp         1/1     Running             0          1s
ovs-node-jzvp8         1/1     Terminating         0          130m
ovs-node-jzvp8         0/1     Terminating         0          130m
ovnkube-node-wmlm4     3/3     Running             0          8s
ovnkube-node-rd7kk     3/3     Terminating         0          129m
ovnkube-node-rd7kk     0/3     Terminating         0          129m
ovs-node-jzvp8         0/1     Terminating         0          130m
ovs-node-jzvp8         0/1     Terminating         0          130m
ovs-node-6gw77         0/1     Pending             0          0s
ovs-node-6gw77         0/1     Pending             0          0s
ovs-node-6gw77         0/1     ContainerCreating   0          0s
ovs-node-6gw77         1/1     Running             0          1s
ovs-node-q4kb7         1/1     Terminating         0          130m
ovs-node-q4kb7         0/1     Terminating         0          130m
ovs-node-q4kb7         0/1     Terminating         0          130m
ovs-node-q4kb7         0/1     Terminating         0          130m
ovs-node-xbc6t         0/1     Pending             0          0s
ovs-node-xbc6t         0/1     Pending             0          0s
ovs-node-xbc6t         0/1     ContainerCreating   0          0s
ovs-node-xbc6t         1/1     Running             0          1s
ovnkube-master-hbb66   0/6     Terminating         0          143m
ovnkube-node-rd7kk     0/3     Terminating         0          130m
ovnkube-node-rd7kk     0/3     Terminating         0          130m
ovnkube-node-9zmdm     0/3     Pending             0          0s
ovnkube-node-9zmdm     0/3     Pending             0          0s
ovnkube-node-9zmdm     0/3     ContainerCreating   0          0s
ovnkube-node-9zmdm     2/3     Running             0          2s
ovnkube-master-hbb66   0/6     Terminating         0          143m
ovnkube-master-hbb66   0/6     Terminating         0          143m
ovnkube-master-mhc7n   0/6     Pending             0          0s
ovnkube-master-mhc7n   0/6     Pending             0          0s
ovnkube-master-mhc7n   0/6     ContainerCreating   0          0s
ovnkube-node-9zmdm     3/3     Running             0          9s
ovnkube-node-z5t24     3/3     Terminating         0          146m
ovnkube-master-mhc7n   4/6     Running             0          7s
ovnkube-node-z5t24     0/3     Terminating         0          146m
ovnkube-node-z5t24     0/3     Terminating         0          146m
ovnkube-node-z5t24     0/3     Terminating         0          146m
ovnkube-node-lp8rv     0/3     Pending             0          0s
ovnkube-node-lp8rv     0/3     Pending             0          0s
ovnkube-node-lp8rv     0/3     ContainerCreating   0          0s
ovnkube-node-lp8rv     2/3     Running             0          2s
ovnkube-node-lp8rv     3/3     Running             0          9s
ovnkube-node-4jvv9     3/3     Terminating         0          146m
ovnkube-node-4jvv9     0/3     Terminating         0          146m
ovnkube-node-4jvv9     0/3     Terminating         0          147m
ovnkube-node-4jvv9     0/3     Terminating         0          147m
ovnkube-node-khfvw     0/3     Pending             0          0s
ovnkube-node-khfvw     0/3     Pending             0          0s
ovnkube-node-khfvw     0/3     ContainerCreating   0          0s
ovnkube-node-khfvw     2/3     Running             0          2s
ovnkube-node-khfvw     3/3     Running             0          8s
ovnkube-node-bgjdq     3/3     Terminating         0          146m
ovnkube-node-bgjdq     0/3     Terminating         0          146m
ovnkube-node-bgjdq     0/3     Terminating         0          146m
ovnkube-node-bgjdq     0/3     Terminating         0          146m
ovnkube-node-hmpbd     0/3     Pending             0          0s
ovnkube-node-hmpbd     0/3     Pending             0          0s
ovnkube-node-hmpbd     0/3     ContainerCreating   0          0s
ovnkube-node-hmpbd     2/3     Running             0          2s
ovnkube-node-hmpbd     3/3     Running             0          10s
ovnkube-node-pvxwj     3/3     Terminating         0          131m
ovnkube-node-pvxwj     0/3     Terminating         0          131m
ovnkube-node-pvxwj     0/3     Terminating         0          131m
ovnkube-node-pvxwj     0/3     Terminating         0          131m
ovnkube-node-bstkm     0/3     Pending             0          0s
ovnkube-node-bstkm     0/3     Pending             0          0s
ovnkube-node-bstkm     0/3     ContainerCreating   0          0s
ovnkube-node-bstkm     2/3     Running             0          1s
ovnkube-node-bstkm     3/3     Running             0          6s
ovnkube-master-mhc7n   5/6     Running             0          96s
ovnkube-master-mhc7n   6/6     Running             0          103s
ovnkube-master-hx8jf   6/6     Terminating         1          131m
ovnkube-master-hx8jf   0/6     Terminating         1          132m
ovnkube-master-hx8jf   0/6     Terminating         1          132m
ovnkube-master-hx8jf   0/6     Terminating         1          132m
ovnkube-master-bgjjj   0/6     Pending             0          0s
ovnkube-master-bgjjj   0/6     Pending             0          0s
ovnkube-master-bgjjj   0/6     ContainerCreating   0          1s
ovnkube-master-bgjjj   4/6     Running             0          4s
ovnkube-master-bgjjj   5/6     Running             0          97s
ovnkube-master-bgjjj   6/6     Running             0          101s
ovnkube-master-4f86n   6/6     Terminating         0          149m
ovnkube-master-4f86n   0/6     Terminating         0          150m
ovnkube-master-4f86n   0/6     Terminating         0          150m
ovnkube-master-4f86n   0/6     Terminating         0          150m
ovnkube-master-t45vg   0/6     Pending             0          0s
ovnkube-master-t45vg   0/6     Pending             0          0s
ovnkube-master-t45vg   0/6     ContainerCreating   0          0s
ovnkube-master-t45vg   4/6     Running             0          3s
ovnkube-master-t45vg   5/6     Running             0          98s
ovnkube-master-t45vg   6/6     Running             0          101s

Comment 7 Federico Paolinelli 2021-03-23 07:56:42 UTC
The logic for the upgrade is based on the cluster version / current version, so in order to verify this you need to perform a real upgrade (i.e. from 4.7 to 4.8)
Changing the image will apply the change to all the pods affected.

Comment 8 Mike Fiedler 2021-03-23 14:09:53 UTC
OK, got it.  was following the repro steps.   Will test with real upgrade.

Comment 9 Mike Fiedler 2021-03-23 16:58:27 UTC
Verified upgrading 4.7.3 to 4.8.0-0.nightly-2021-03-22-104536.  The over all upgrade got stuck due to bug 1933772 at the MCO phase, but I could see networking and openshift-ovn-kubernetes upgrade successfully.  Nodes first, followed by masters.

Comment 12 errata-xmlrpc 2021-07-27 22:53:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.