Created attachment 1214248 [details] node start failure log Description of problem: Clone this bug from https://bugzilla.redhat.com/show_bug.cgi?id=1388288#c7. Version-Release number of selected component (if applicable): # openshift version openshift v3.4.0.15+9c963ec kubernetes v1.4.0+776c994 etcd 3.1.0-alpha.1 # rpm -q docker docker-1.10.3-57.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. install env successfully with "redhat/openshift-ovs-multitenant" # openshift version openshift v3.4.0.15+9c963ec kubernetes v1.4.0+776c994 etcd 3.1.0-alpha.1 # rpm -q docker docker-1.10.3-57.el7.x86_64 # oc get nodes NAME STATUS AGE ip-172-18-10-70.ec2.internal Ready 1h ip-172-18-6-3.ec2.internal Ready,SchedulingDisabled 1h 2. make sure there is no pod running on node. # oc scale --replicas=0 dc/registry-console 3. restart node successfully. 4. make sure there is a pod running on node. # oc scale --replicas=1 dc/registry-console # oc get po NAME READY STATUS RESTARTS AGE registry-console-1-k2brf 1/1 Running 0 3m 5. restart node, failed. # service atomic-openshift-node restart Redirecting to /bin/systemctl restart atomic-openshift-node.service Job for atomic-openshift-node.service failed because a timeout was exceeded. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details. Actual results: restart node failed. Expected results: restart successfully. Additional info:
It is not related to plugin type, the problem exists in both subnet and multitenant env. The valuable logs from my viewpoint are: Oct 25 08:26:01 ip-172-18-24-156.ec2.internal atomic-openshift-node[92648]: I1025 08:26:01.979679 92648 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready] Oct 25 08:26:31 ip-172-18-24-156.ec2.internal atomic-openshift-node[92648]: I1025 08:26:31.980867 92648 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready] Oct 25 08:26:36 ip-172-18-24-156.ec2.internal atomic-openshift-node[92648]: I1025 08:26:36.981065 92648 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready] Oct 25 08:27:02 ip-172-18-24-156.ec2.internal atomic-openshift-node[92947]: I1025 08:27:02.257550 92947 kubelet.go:2240] skipping pod synchronization - [network state unknown container runtime is down] Seems that the node/kubelet cannot get the correct pod status or cannot bring the existing pods up after restarting.
Is this a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1388556 ?
Any chance you get get more of the node's logs, and better yet with --loglevel=5 ?
(In reply to Dan Williams from comment #3) > Any chance you get get more of the node's logs, and better yet with > --loglevel=5 ? The node logs was gotten at --loglevel=5.
I believe this should be fixed by https://github.com/openshift/origin/pull/11613 and more specifically https://github.com/openshift/origin/pull/11613/commits/d861f0630f5888756516277e6e5800a83089208c
Can't be MODIFIED until the PR is merged.
This has been merged into ose and is in OSE v3.4.0.22 or newer.
Verified this bug with atomic-openshift-3.4.0.22-1.git.0.5c56720.el7.x86_64, and PASS. Now re-start node successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066