Bug 1754434 - Failed to recovery from expired control plane certificates with network error
Summary: Failed to recovery from expired control plane certificates with network error
Keywords:
Status: CLOSED DUPLICATE of bug 1754638
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Dan Williams
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-23 08:55 UTC by zhou ying
Modified: 2019-10-08 12:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-23 20:13:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description zhou ying 2019-09-23 08:55:17 UTC
Description of problem:
Failed to recovery from expired control plane certificates with network error.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-09-22-222738

How reproducible:


Steps to Reproduce:
1. Follow doc to the the certificate recovery:
https://github.com/openshift/openshift-docs/blob/master/modules/dr-recover-expired-control-plane-certs.adoc

2. After do the recovery , check the node status.

Actual results:
2. Firstly one of the master was "NotReady", after about 1 hour all the master and worker nodes became with "NotReady" status.
[root@ip-10-0-131-42 ~]# oc get node
NAME                                         STATUS     ROLES    AGE    VERSION
ip-10-0-131-42.us-east-2.compute.internal    Ready      master   126m   v1.14.6+c4799753c
ip-10-0-134-174.us-east-2.compute.internal   Ready      worker   120m   v1.14.6+c4799753c
ip-10-0-144-32.us-east-2.compute.internal    Ready      worker   120m   v1.14.6+c4799753c
ip-10-0-157-252.us-east-2.compute.internal   NotReady   master   126m   v1.14.6+c4799753c
ip-10-0-161-159.us-east-2.compute.internal   Ready      master   126m   v1.14.6+c4799753c

[root@ip-10-0-131-42 ~]# oc get node
NAME                                         STATUS     ROLES    AGE    VERSION
ip-10-0-131-42.us-east-2.compute.internal    NotReady   master   143m   v1.14.6+c4799753c
ip-10-0-134-174.us-east-2.compute.internal   NotReady   worker   137m   v1.14.6+c4799753c
ip-10-0-144-32.us-east-2.compute.internal    NotReady   worker   137m   v1.14.6+c4799753c
ip-10-0-157-252.us-east-2.compute.internal   NotReady   master   143m   v1.14.6+c4799753c
ip-10-0-161-159.us-east-2.compute.internal   NotReady   master   143m   v1.14.6+c4799753c


Logs from the master :
[root@ip-10-0-157-252 kubernetes]# journalctl -f |grep E0923
Sep 23 07:13:20 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:20.890306   73361 pod_workers.go:190] Error syncing pod 1a972930-ddc3-11e9-b598-0ab22ddeef06 ("multus-admission-controller-88pws_openshift-multus(1a972930-ddc3-11e9-b598-0ab22ddeef06)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Sep 23 07:13:21 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:21.554106   73361 kubelet.go:2180] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Sep 23 07:13:22 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:22.887094   73361 pod_workers.go:190] Error syncing pod 948d4cb7-ddc2-11e9-b598-0ab22ddeef06 ("apiserver-wrnn7_openshift-apiserver(948d4cb7-ddc2-11e9-b598-0ab22ddeef06)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Sep 23 07:13:22 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:22.888675   73361 pod_workers.go:190] Error syncing pod 59454253-ddc4-11e9-bad7-0215a2855e60 ("controller-manager-z8nhp_openshift-controller-manager(59454253-ddc4-11e9-bad7-0215a2855e60)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Sep 23 07:13:22 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:22.889347   73361 pod_workers.go:190] Error syncing pod 1a972930-ddc3-11e9-b598-0ab22ddeef06 ("multus-admission-controller-88pws_openshift-multus(1a972930-ddc3-11e9-b598-0ab22ddeef06)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Sep 23 07:13:22 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:22.892144   73361 pod_workers.go:190] Error syncing pod b0253183-ddc2-11e9-b598-0ab22ddeef06 ("dns-default-x8kgg_openshift-dns(b0253183-ddc2-11e9-b598-0ab22ddeef06)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
Sep 23 07:13:22 ip-10-0-157-252 hyperkube[73361]: E0923 07:13:22.893424   73361 pod_workers.go:190] Error syncing pod b9310a2a-ddc2-11e9-b598-0ab22ddeef06 ("node-ca-5nbs6_openshift-image-registry(b9310a2a-ddc2-11e9-b598-0ab22ddeef06)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network


[root@ip-10-0-157-252 kubernetes]# journalctl -f -u crio
-- Logs begin at Mon 2019-09-23 04:57:37 UTC. --
Sep 23 07:33:59 ip-10-0-157-252 crio[1107]: 2019-09-23T07:33:59Z [error] Multus: error in invoke Delegate del - "openshift-sdn": failed to find plugin "openshift-sdn" in path [/var/lib/cni/bin /opt/multus/bin]
Sep 23 07:33:59 ip-10-0-157-252 crio[1107]: time="2019-09-23 07:33:59.630920322Z" level=error msg="Error deleting network: Multus: error in invoke Delegate del - "openshift-sdn": failed to find plugin "openshift-sdn" in path [/var/lib/cni/bin /opt/multus/bin]"
Sep 23 07:33:59 ip-10-0-157-252 crio[1107]: time="2019-09-23 07:33:59.630949656Z" level=error msg="Error while removing pod from CNI network "multus-cni-network": Multus: error in invoke Delegate del - "openshift-sdn": failed to find plugin "openshift-sdn" in path [/var/lib/cni/bin /opt/multus/bin]"
Sep 23 07:34:35 ip-10-0-157-252 crio[1107]: 2019-09-23T07:34:35Z [error] Multus: error in invoke Delegate add - "openshift-sdn": failed to find plugin "openshift-sdn" in path [/var/lib/cni/bin /opt/multus/bin]
Sep 23 07:34:35 ip-10-0-157-252 crio[1107]: 2019-09-23T07:34:35Z [verbose] Del: openshift-operator-lifecycle-manager:packageserver-86d5778579-w8p2m:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
Sep 23 07:34:35 ip-10-0-157-252 crio[1107]: 2019-09-23T07:34:35Z [error] Multus: error in invoke Delegate del - "openshift-sdn": failed to find plugin "openshift-sdn" in path [/var/lib/cni/bin /opt/multus/bin]
Sep 23 07:34:35 ip-10-0-157-252 crio[1107]: 2019-09-23T07:34:35Z [error] Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": failed to find plugin "openshift-sdn" in path [/var/lib/cni/bin /opt/multus/bin]

Expected results:
2. All the master and worker nodes work well.


Additional info:

Comment 1 Casey Callendrello 2019-09-23 13:10:15 UTC
*** Bug 1753801 has been marked as a duplicate of this bug. ***

Comment 2 Casey Callendrello 2019-09-23 13:28:45 UTC
Can we get a must-gather or a kubeconfig from the cluster?

Comment 3 Ben Bennett 2019-09-23 20:13:34 UTC
Calling this a dupe based on "NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network"

*** This bug has been marked as a duplicate of bug 1754638 ***

Comment 4 zhou ying 2019-09-24 01:20:51 UTC
Since all the master and worker are NotReady , can't run the must-gather tool.


Note You need to log in before you can comment on or make changes to this bug.