Bug 1368398 - The node cannot be started if only delete its hostsubnet from master
Summary: The node cannot be started if only delete its hostsubnet from master
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Ravi Sankar
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-19 09:38 UTC by Meng Bo
Modified: 2016-09-08 02:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-07 21:22:48 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Meng Bo 2016-08-19 09:38:53 UTC
Description of problem:
Setup multi-node env, delete one of the nodes' hostsubnet from master. Restart the node service from node.The node will not be started due to cannot find subnet.

The workaround is delete the node from the master.

[root@ose-node1 ~]# journalctl -lf -u atomic-openshift-node.service
-- Logs begin at Fri 2016-08-12 15:45:04 CST. --
Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: I0819 10:52:19.585953   66314 manager.go:281] Starting recovery of all containers
Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: I0819 10:52:19.588582   66314 kubelet.go:1186] Node ose-node1.bmeng.local was previously registered
Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: I0819 10:52:19.631393   66314 manager.go:286] Recovery completed
Aug 19 10:52:19 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:19.913408   66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting...
Aug 19 10:52:20 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:20.415377   66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting...
Aug 19 10:52:20 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:20.917206   66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting...
Aug 19 10:52:21 ose-node1.bmeng.local atomic-openshift-node[66314]: W0819 10:52:21.418985   66314 subnets.go:192] Could not find an allocated subnet for node: ose-node1.bmeng.local, Waiting...


Version-Release number of selected component (if applicable):
v3.3.0.22

How reproducible:
always

Steps to Reproduce:
1. Setup multi-node env
2. Delete the hostsubnet of one node
# oc delete hostsubnet node1
3. Restart the node service
4. Delete the node 
# oc delete node node1
5. Restart the node service

Actual results:
3. The node cannot be started due to cannot find subnet.
5. The node can be started normally.

Expected results:
The node should be able to start if only the hostsubnet was deleted from the etcd.

Additional info:

Comment 1 Ravi Sankar 2016-09-06 19:38:43 UTC
OpenShift SDN master watches for node changes and whenever a node is added, hostsubnet is created for the corresponding node and similarly deletion of node triggers deletion of corresponding hostsubnet. So HostSubnet resource is intended to be internal to openshift SDN and any random manipulation under the covers by cluster admin can not guarantee SDN solution to work correctly. 

If you restart both master and node(not just the node) then we do fix the issue.
Please close this bug as won't fix.

Comment 2 Meng Bo 2016-09-08 02:21:26 UTC
Yes, the restarting master will fix the issue.


Note You need to log in before you can comment on or make changes to this bug.