Description of problem: Create cluster with 3 masters and 2 workers with OVN network type. Check the ovn-master pod is running on one of master # oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-86db46c79b-n8mhn 4/4 Running 0 7h15m 10.0.130.36 ip-10-0-130-36.eu-west-2.compute.internal <none> <none> ovnkube-node-bbhvj 3/3 Running 0 7h10m 10.0.141.250 ip-10-0-141-250.eu-west-2.compute.internal <none> <none> ovnkube-node-mk74p 3/3 Running 0 7h10m 10.0.146.207 ip-10-0-146-207.eu-west-2.compute.internal <none> <none> ovnkube-node-qc5mq 3/3 Running 0 7h15m 10.0.130.36 ip-10-0-130-36.eu-west-2.compute.internal <none> <none> ovnkube-node-r2s67 3/3 Running 0 7h15m 10.0.146.202 ip-10-0-146-202.eu-west-2.compute.internal <none> <none> ovnkube-node-t6742 3/3 Running 0 7h15m 10.0.160.194 ip-10-0-160-194.eu-west-2.compute.internal <none> <none> When I delete the ovn-master pod, the new created ovn-master pod is scheduled another master. oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-86db46c79b-fgp4v 4/4 Running 1 39s 10.0.146.202 ip-10-0-146-202.eu-west-2.compute.internal <none> <none> Check the new created pod logs of run-ovn-northd: #oc logs ovnkube-master-86db46c79b-fgp4v -c run-ovn-northd ================== ovnkube.sh --- version: 3 ================ ==================== command: run-ovn-northd =================== hostname: ip-10-0-146-202 =================== daemonset version 3 =================== Image built from ovn-kubernetes ref: refs/heads/rhaos-4.2-rhel-7 commit: fb435e034a426d1a11fc61b284426e8ea82187ee =============== run-ovn-northd (wait for ready_to_start_node) =============== run_ovn_northd ========== MASTER ONLY ovn_db_host 10.0.130.36 ovn_nbdb tcp://10.0.130.36:6641 ovn_sbdb tcp://10.0.130.36:6642 ovn_northd_opts=--db-nb-sock=/var/run/openvswitch/ovnnb_db.sock --db-sb-sock=/var/run/openvswitch/ovnsb_db.sock ovn_log_northd=-vconsole:info nice: cannot set niceness: Permission denied 2019-08-29T00:25:50Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log Starting ovn-northd. run as: /usr/share/openvswitch/scripts/ovn-ctl start_northd --no-monitor --ovn-manage-ovsdb=no --ovn-northd-nb-db=tcp:10.0.130.36:6641 --ovn-northd-sb-db=tcp:10.0.130.36:6642 --ovn-northd-log=-vconsole:info --db-nb-sock=/var/run/openvswitch/ovnnb_db.sock --db-sb-sock=/var/run/openvswitch/ovnsb_db.sock =============== run_ovn_northd ========== RUNNING 2019-08-29T00:25:50.576Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovn-northd.log 2019-08-29T00:25:50.577Z|00002|reconnect|INFO|tcp:10.0.130.36:6641: connecting... 2019-08-29T00:25:50.577Z|00003|reconnect|INFO|tcp:10.0.130.36:6642: connecting... 2019-08-29T00:25:50.577Z|00004|reconnect|INFO|tcp:10.0.130.36:6641: connected 2019-08-29T00:25:50.577Z|00005|reconnect|INFO|tcp:10.0.130.36:6642: connected 2019-08-29T00:26:02.580Z|00006|reconnect|INFO|tcp:10.0.130.36:6641: connection closed by peer 2019-08-29T00:26:03.579Z|00007|reconnect|INFO|tcp:10.0.130.36:6641: connecting... 2019-08-29T00:26:03.579Z|00008|reconnect|INFO|tcp:10.0.130.36:6641: connection attempt failed (Connection refused) 2019-08-29T00:26:03.579Z|00009|reconnect|INFO|tcp:10.0.130.36:6641: waiting 2 seconds before reconnect 2019-08-29T00:26:04.387Z|00010|reconnect|INFO|tcp:10.0.130.36:6642: connection closed by peer 2019-08-29T00:26:05.388Z|00011|reconnect|INFO|tcp:10.0.130.36:6642: connecting... 2019-08-29T00:26:05.389Z|00012|reconnect|INFO|tcp:10.0.130.36:6642: connection attempt failed (Connection refused) 2019-08-29T00:26:05.389Z|00013|reconnect|INFO|tcp:10.0.130.36:6642: waiting 2 seconds before reconnect 2019-08-29T00:26:05.580Z|00014|reconnect|INFO|tcp:10.0.130.36:6641: connecting... 2019-08-29T00:26:05.581Z|00015|reconnect|INFO|tcp:10.0.130.36:6641: connection attempt failed (Connection refused) 2019-08-29T00:26:05.581Z|00016|reconnect|INFO|tcp:10.0.130.36:6641: waiting 4 seconds before reconnect 2019-08-29T00:26:07.389Z|00017|reconnect|INFO|tcp:10.0.130.36:6642: connecting... 2019-08-29T00:26:07.390Z|00018|reconnect|INFO|tcp:10.0.130.36:6642: connection attempt failed (Connection refused) 2019-08-29T00:26:07.390Z|00019|reconnect|INFO|tcp:10.0.130.36:6642: waiting 4 seconds before reconnect 2019-08-29T00:26:09.582Z|00020|reconnect|INFO|tcp:10.0.130.36:6641: connecting... 2019-08-29T00:26:09.583Z|00021|reconnect|INFO|tcp:10.0.130.36:6641: connection attempt failed (Connection refused) 2019-08-29T00:26:09.583Z|00022|reconnect|INFO|tcp:10.0.130.36:6641: continuing to reconnect in the background but suppressing further logging 2019-08-29T00:26:11.391Z|00023|reconnect|INFO|tcp:10.0.130.36:6642: connecting... 2019-08-29T00:26:11.392Z|00024|reconnect|INFO|tcp:10.0.130.36:6642: connection attempt failed (Connection refused) 2019-08-29T00:26:11.392Z|00025|reconnect|INFO|tcp:10.0.130.36:6642: continuing to reconnect in the background but suppressing further logging ***************************************************** . Create test pods, it shows error: Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-rc-gmq5f_z3_0ed2d931-c9f4-11e9-b781-0ad6b5818122_0(b7ab06ded547b23f37415c134610e507daaa087e4b436aa3574c085308e63411): CNI request failed with status 400: 'Nil response to CNI request Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-28-083236 How reproducible: always Steps to Reproduce: 1. setup 3 masters and 2 worker with OVN type 2. Create test pod and check it works well 3. Delete the ovn-master pod and make it re-schedule to another master 4. Check the ovn-master logs 5. Create test pod Actual results: 4. See logs in $description 5. test pod cannot be created Expected results: Additional info:
This is definitely a known issue, and we have an extensive effort to fix this. Marking this bug as WONTFIX.