Hide Forgot
Description of problem: Setup multi-node env, delete one of the nodes' hostsubnet from master. Restart the node service from node.The node will not be started due to cannot find subnet. The workaround is delete the node from the master. [root@ose-node1 ~]# journalctl -lf -u atomic-openshift-node.service -- Logs begin at Fri 2016-08-12 15:45:04 CST. -- Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: I0819 10:52:19.585953 66314 manager.go:281] Starting recovery of all containers Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: I0819 10:52:19.588582 66314 kubelet.go:1186] Node ose-node1.bmeng.local was previously registered Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: I0819 10:52:19.631393 66314 manager.go:286] Recovery completed Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:19.913408 66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting... Aug 19 10:52:20 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:20.415377 66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting... Aug 19 10:52:20 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:20.917206 66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting... Aug 19 10:52:21 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:21.418985 66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting... Version-Release number of selected component (if applicable): v3.3.0.22 How reproducible: always Steps to Reproduce: 1. Setup multi-node env 2. Delete the hostsubnet of one node # oc delete hostsubnet node1 3. Restart the node service 4. Delete the node # oc delete node node1 5. Restart the node service Actual results: 3. The node cannot be started due to cannot find subnet. 5. The node can be started normally. Expected results: The node should be able to start if only the hostsubnet was deleted from the etcd. Additional info:
OpenShift SDN master watches for node changes and whenever a node is added, hostsubnet is created for the corresponding node and similarly deletion of node triggers deletion of corresponding hostsubnet. So HostSubnet resource is intended to be internal to openshift SDN and any random manipulation under the covers by cluster admin can not guarantee SDN solution to work correctly. If you restart both master and node(not just the node) then we do fix the issue. Please close this bug as won't fix.
Yes, the restarting master will fix the issue.