Bug 1858834 - [OVN] 4.5.3 upgrade failure---some ovnkube-master and ovnkube-node is in CrashLoopBackOff
Summary: [OVN] 4.5.3 upgrade failure---some ovnkube-master and ovnkube-node is in Cra...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Federico Paolinelli
QA Contact: Anurag saxena
: 1859365 (view as bug list)
Depends On:
Blocks: 1858712
TreeView+ depends on / blocked
Reported: 2020-07-20 14:06 UTC by W. Trevor King
Modified: 2021-04-05 17:46 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1858712
Last Closed: 2020-10-27 16:16:00 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 723 0 None closed Bug 1858834: Revert ovn db consistency check. 2021-02-08 04:19:02 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:16:31 UTC

Description W. Trevor King 2020-07-20 14:06:13 UTC
+++ This bug was initially created as a clone of Bug #1858712 +++

Version-Release number of selected component (if applicable):

Base version:4.5.2-x86_64
Target version:4.5.0-0.nightly-2020-07-18-024505

How reproducible: 

Steps to Reproduce:
Use the upgrade ci to trigger upgrade from 4.5.2-x86_64 to 4.5.0-0.nightly-2020-07-18-024505

Finally upgrade failed.

Actual Result:
version   4.5.2     True        True          3h19m   Unable to apply 4.5.0-0.nightly-2020-07-18-024505: an unknown error has occurred: MultipleErrors

oc get co network -o yaml
  - lastTransitionTime: "2020-07-20T04:43:27Z"
    message: |-
      DaemonSet "openshift-ovn-kubernetes/ovnkube-master" rollout is not making progress - last change 2020-07-20T04:33:13Z
      DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2020-07-20T04:32:51Z
    reason: RolloutHung
    status: "True"
    type: Degraded
  - lastTransitionTime: "2020-07-20T03:27:52Z"
    status: "True"
    type: Upgradeable
  - lastTransitionTime: "2020-07-20T04:31:05Z"
    message: |-
      DaemonSet "openshift-multus/multus-admission-controller" update is rolling out (1 out of 3 updated)
      DaemonSet "openshift-ovn-kubernetes/ovnkube-master" is not available (awaiting 1 nodes)
      DaemonSet "openshift-ovn-kubernetes/ovnkube-node" update is rolling out (4 out of 6 updated)

One multus pod is in ContainerCreating with error:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_multus-admission-controller-2wxwn_openshift-multus_7dc31947-f76a-4207-9288-38778b17eafe_0(be44e539169c537a7867a026c71109fb91d3b7086580e86471269665b7548578): Multus: [openshift-multus/multus-admission-controller-2wxwn]: error adding container to network “ovn-kubernetes”: delegateAdd: error invoking confAdd - “ovn-k8s-cni-overlay”: error in getting result from AddNetwork: CNI request failed with status 400: ’[openshift-multus/multus-admission-controller-2wxwn] failed to configure pod interface: failure in plugging pod interface: failed to run ‘ovs-vsctl --timeout=30 add-port br-int be44e539169c537 -- set interface be44e539169c537 external_ids:attached_mac=2e:a8:2b:82:00:04 external_ids:iface-id=openshift-multus_multus-admission-controller-2wxwn external_ids:ip_addresses= external_ids:sandbox=be44e539169c537a7867a026c71109fb91d3b7086580e86471269665b7548578’: exit status 1
 “ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n”
 oc get pods -o wide -n openshift-ovn-kubernetes
NAME                   READY   STATUS             RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
ovnkube-master-gjs4n   2/4     CrashLoopBackOff   39         3h    ip-10-0-136-16.us-east-2.compute.internal    <none>           <none>
ovnkube-master-gx75c   4/4     Running            0          3h2m   ip-10-0-204-230.us-east-2.compute.internal   <none>           <none>
ovnkube-master-mn5vc   4/4     Running            0          3h1m   ip-10-0-178-208.us-east-2.compute.internal   <none>           <none>
ovnkube-node-2bwg2     2/2     Running            0          3h2m   ip-10-0-178-208.us-east-2.compute.internal   <none>           <none>
ovnkube-node-7vszf     1/2     CrashLoopBackOff   33         3h1m   ip-10-0-204-230.us-east-2.compute.internal   <none>           <none>
ovnkube-node-8clcn     2/2     Running            0          3h49m   ip-10-0-135-242.us-east-2.compute.internal   <none>           <none>
ovnkube-node-srwtf     2/2     Running            0          3h2m     ip-10-0-165-9.us-east-2.compute.internal     <none>           <none>
ovnkube-node-v8j5p     2/2     Running            0          3h2m   ip-10-0-214-190.us-east-2.compute.internal   <none>           <none>
ovnkube-node-zslrj     2/2     Running            0          4h5m    ip-10-0-136-16.us-east-2.compute.internal    <none>           <none>
ovs-node-cwfz2         1/1     Running            0          3h   ip-10-0-135-242.us-east-2.compute.internal   <none>           <none>
ovs-node-fvblz         1/1     Running            0          3h1m   ip-10-0-178-208.us-east-2.compute.internal   <none>           <none>
ovs-node-t8vl6         1/1     Running            0          179m   ip-10-0-204-230.us-east-2.compute.internal   <none>           <none>
ovs-node-thbn7         1/1     Running            0          3h2m    ip-10-0-136-16.us-east-2.compute.internal    <none>           <none>
ovs-node-vp2rp         1/1     Running            0          3h2m   ip-10-0-214-190.us-east-2.compute.internal   <none>           <none>
ovs-node-vwrps         1/1     Running            0          179m     ip-10-0-165-9.us-east-2.compute.internal     <none> 

 oc logs -c ovnkube-master ovnkube-master-gjs4n  -n   openshift-ovn-kubernetes
+ [[ -f /env/_master ]]
+ hybrid_overlay_flags=
+ [[ -n '' ]]
++ ovn-nbctl --pidfile=/var/run/ovn/ovn-nbctl.pid --detach -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db ssl:,ssl:,ssl:
2020-07-20T05:39:13Z|00184|stream_ssl|WARN|SSL_connect: unexpected SSL connection close

oc logs -c ovnkube-node ovnkube-node-7vszf -n openshift-ovn-kubernetes
I0720 06:48:08.321931  404343 ovs.go:249] exec(122): stdout: "not connected\n"
I0720 06:48:08.321965  404343 ovs.go:250] exec(122): stderr: ""
I0720 06:48:08.321981  404343 node.go:116] node ip-10-0-204-230.us-east-2.compute.internal connection status = not connected
I0720 06:48:08.792527  404343 ovs.go:246] exec(123): /usr/bin/ovs-appctl --timeout=15 -t /var/run/ovn/ovn-controller.93522.ctl connection-status
I0720 06:48:08.820573  404343 ovs.go:249] exec(123): stdout: "not connected\n"
I0720 06:48:08.820724  404343 ovs.go:250] exec(123): stderr: ""
I0720 06:48:08.820748  404343 node.go:116] node ip-10-0-204-230.us-east-2.compute.internal connection status = not connected
I0720 06:48:08.820767  404343 ovs.go:246] exec(124): /usr/bin/ovs-appctl --timeout=15 -t /var/run/ovn/ovn-controller.93522.ctl connection-status
I0720 06:48:08.847272  404343 ovs.go:249] exec(124): stdout: "not connected\n"
I0720 06:48:08.847306  404343 ovs.go:250] exec(124): stderr: ""
I0720 06:48:08.847321  404343 node.go:116] node ip-10-0-204-230.us-east-2.compute.internal connection status = not connected
F0720 06:48:08.847355  404343 ovnkube.go:129] timed out waiting sbdb for node ip-10-0-204-230.us-east-2.compute.internal: timed out waiting for the condition

Comment 1 W. Trevor King 2020-07-20 14:18:19 UTC
Closing as a dup of bug 1837953 , see [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1858712#c16

*** This bug has been marked as a duplicate of bug 1837953 ***

Comment 2 W. Trevor King 2020-07-20 15:34:50 UTC
Un-duping, based on Scott's change to bug 1858712.

Comment 7 Anurag saxena 2020-07-29 14:57:43 UTC
Upgrade on 4.6.0-0.nightly-2020-07-25-065959 -> 4.6.0-0.nightly-2020-07-25-091217 looks good. Verifying this bug on same observations

Comment 8 Aniket Bhat 2020-07-31 19:53:29 UTC
*** Bug 1859365 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2020-10-27 16:16:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 11 W. Trevor King 2021-04-05 17:46:52 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475

Note You need to log in before you can comment on or make changes to this bug.