Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1732789 - failed to start the plugin Failed to start sdn network plugin "redhat/openshift-ovs-networkpolicy"
Summary: failed to start the plugin Failed to start sdn network plugin "redhat/openshi...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.1.z
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: 4.1.z
Assignee: Jacob Tanenbaum
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-24 11:27 UTC by jmselmi
Modified: 2019-07-24 17:58 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-24 17:58:51 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description jmselmi 2019-07-24 11:27:37 UTC
Description of problem:
During the deployment 4.1.6 on aws: the bootstrap process was failing because of a timeout. When I check the pods I see that operator is crashing and on the log it failed to connect to the kubernetes svc (on default):

oc --config=./kubeconfig get pods --all-namespaces
^[[ONAMESPACE                                    NAME                                                     READY   STATUS              RESTARTS   AGE
openshift-apiserver-operator                 openshift-apiserver-operator-5974c4ffd7-zmr6f            0/1     CrashLoopBackOff    40         3h45m
openshift-cloud-credential-operator          cloud-credential-operator-67798b4f87-vg7kg               0/1     CrashLoopBackOff    44         3h45m

logs from the api-operator:

W0724 09:52:54.262841       1 builder.go:181] unable to get owner reference (falling back to namespace): Get https://172.33.0.1:443/api/v1/namespaces/openshift-apiserver-operator/pods: dial tcp 172.33.0.1:443: i/o timeout
F0724 09:53:24.725846       1 cmd.go:92] Get https://172.33.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.33.0.1:443: i/o timeout

The sdn pods are up and running:

kubectl -n openshift-sdn get pod --field-selector "spec.nodeName=ip-10-0-63-227.eu-west-3.compute.internal"
NAME                   READY   STATUS    RESTARTS   AGE
ovs-xtsph              1/1     Running   0          151m
sdn-controller-c75c7   1/1     Running   0          151m
sdn-zp6kn              1/1     Running   1          151m

I see from the log:

2019-07-24T07:37:08.326568265+00:00 stderr F I0724 07:37:08.326483    2002 cmd.go:253] Overriding kubernetes api to https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443
2019-07-24T07:37:08.326648181+00:00 stderr F I0724 07:37:08.326571    2002 cmd.go:142] Reading node configuration from /config/sdn-config.yaml
2019-07-24T07:37:08.330031137+00:00 stderr F W0724 07:37:08.329398    2002 server.go:198] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
2019-07-24T07:37:08.330031137+00:00 stderr F I0724 07:37:08.329512    2002 feature_gate.go:206] feature gates: &{map[]}
2019-07-24T07:37:08.330031137+00:00 stderr F I0724 07:37:08.329675    2002 cmd.go:279] Watching config file /config/sdn-config.yaml for changes
2019-07-24T07:37:08.330031137+00:00 stderr F I0724 07:37:08.329739    2002 cmd.go:279] Watching config file /config/..2019_07_24_07_36_59.077187941/sdn-config.yaml for changes
2019-07-24T07:37:08.331453871+00:00 stderr F I0724 07:37:08.331416    2002 node.go:140] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-10-0-63-227.eu-west-3.compute.internal" (IP ""), iptables sync period "30s"
2019-07-24T07:37:08.336911988+00:00 stderr F I0724 07:37:08.336880    2002 cmd.go:212] Starting node networking (v4.1.6-201907101224+f4dafb5-dirty)
2019-07-24T07:37:08.336976826+00:00 stderr F I0724 07:37:08.336967    2002 node.go:254] Starting openshift-sdn network plugin
2019-07-24T07:37:08.361888912+00:00 stderr F F0724 07:37:08.361861    2002 cmd.go:124] Failed to start sdn: failed to validate network configuration: master has not created a default cluster network, network plugin "redhat/openshift-ovs-networkpolicy" can not start

The connection problem to the cluster-ip 172.33.0.1 (kubernetes svc) is impacting the rest the pod/operator that is using it.


Version-Release number of selected component (if applicable):
OCP 4.1.6

How reproducible:
install OCP on AWS.

Actual results:

Failed installation

Expected results:

It should be deployed correctly.
As the SDN pods are running and functioning.

Comment 1 Casey Callendrello 2019-07-24 11:55:09 UTC
Can you post the output of oc -n openshift-sdn logs -l app=sdn-controller?

Comment 2 jmselmi 2019-07-24 12:10:00 UTC
^[[OI0724 07:37:09.539960       1 leaderelection.go:205] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
log of one the pod: 

E0724 07:57:27.278594       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
E0724 08:40:15.800579       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 09:01:46.137096       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 09:22:57.719346       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 09:44:08.299102       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
E0724 10:05:34.455898       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 10:27:03.222520       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
E0724 10:48:16.000942       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.54.88:6443: connect: connection refused
E0724 11:09:42.162796       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.54.88:6443: connect: connection refused
E0724 11:30:53.341619       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.54.88:6443: connect: connection refused
E0724 11:52:17.645334       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout

Comment 3 jmselmi 2019-07-24 12:10:35 UTC
 oc -n openshift-sdn logs sdn-controller-f4w8c
^[[OI0724 07:37:09.539960       1 leaderelection.go:205] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
E0724 07:57:27.278594       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
E0724 08:40:15.800579       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 09:01:46.137096       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 09:22:57.719346       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 09:44:08.299102       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
E0724 10:05:34.455898       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.82.251:6443: i/o timeout
E0724 10:27:03.222520       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
E0724 10:48:16.000942       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.54.88:6443: connect: connection refused
E0724 11:09:42.162796       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.54.88:6443: connect: connection refused
E0724 11:30:53.341619       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.54.88:6443: connect: connection refused
E0724 11:52:17.645334       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.gold.ocp.euw3.pub.nbyt.fr:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.77.97:6443: i/o timeout
[root@ip-10-0-7-203 ~]# oc -n openshift-sdn logs sdn-controller-ptnzc
^[[OI0724 11:52:24.278762       1 leaderelection.go:205] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
I0724 11:52:34.295636       1 leaderelection.go:214] successfully acquired lease openshift-sdn/openshift-network-controller
I0724 11:52:34.295826       1 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-sdn", Name:"openshift-network-controller", UID:"dc977b50-ade5-11e9-9ded-069bbc14fd3c", APIVersion:"v1", ResourceVersion:"28460", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-93-155 became leader
I0724 11:52:34.383308       1 master.go:57] Initializing SDN master of type "redhat/openshift-ovs-networkpolicy"
I0724 11:52:34.406167       1 network_controller.go:49] Started OpenShift Network Controller

Comment 4 jmselmi 2019-07-24 17:58:51 UTC
it was due to the wrong configuration on the cidr vpc (wrong range);


Note You need to log in before you can comment on or make changes to this bug.